University  of  Alberta 


CONCERT  HALL  ACOUSTICS 
AND  THEIR  EFFECT  ON  CHORAL  DICTION 


by 


Alison  Norris 


An  essay  submitted  in  partial  fulfillment  of  the  requirements 
for  the  degree  of 

Master  of  Music 
in 

Choral  Conducting 


Department  of  Music 
Edmonton,  Alberta 
Fall  2012 


Digitized  by  the  Internet  Archive 
in  2017  with  funding  from 
University  of  Alberta  Libraries 


https://archive.org/details/norris2012 


Abstract 


Clear  communication  with  the  audience  is  surely  one  of  the  primary  goals  of  all 
choral  ensembles.  Yet  when  choirs  perform,  the  words  they  sing  are  often  difficult  to 
decipher.  When  they  sing  in  beautiful  ‘live’  acoustic  spaces,  comprehension  of  text 
becomes  an  even  greater  challenge,  and  yet  those  large  reverberant  performance  venues 
are  specifically  designed  to  enhance  the  experience  of  live  musical  performance. 

Choir  conductors  employ  a  variety  of  methods  to  deal  with  the  problem  of  choral 
diction,  yet  many  of  these  methods  are  not  scientifically  based  and  their  effectiveness  has 
not  been  examined.  This  study  seeks  to  provide  conductors  with  some  of  the  acoustic  and 
phonetic  knowledge  that  will  help  to  elevate  the  level  of  textual  clarity  to  the  highest 
possible  degree  with  their  ensembles. 

Through  the  use  of  computer  simulation  and  listener  feedback,  this  essay 
explores  the  key  difficulties  in  achieving  clear  diction  in  different  types  of  performance 
spaces.  Phonetic  analyses  helps  to  provide  alternative  solutions  to  diction  problems  and  to 
evaluate  the  most  common  and  current  methods  for  diction  improvement  in  choirs. 

A  room  acoustics  modeling  software  was  employed  to  artificially  create  four 
distinct  performance  venues,  each  with  its  own  acoustical  characteristics.  Several 
excerpts  of  choral  music  were  then  virtually  performed  in  each  of  the  four  halls.  These 
computer-generated  sound  files  were  heard  by  unbiased  listeners  to  determine  the  clarity 
of  text  in  each  of  the  acoustical  conditions.  The  listener  data  yielded  text  intelligibility 
rates  of  less  than  fifty  percent  for  all  types  of  performance  venues,  and  identified  plosives 
and  nasal  consonants  as  the  most  problematic  in  more  reverberant  spaces. 


' 


Once  identified,  these  problematic  speech  components  were  examined  using  a 
type  of  analysis  called  a  spectrogram,  a  visual  representation  of  the  patterns  of  speech. 
The  spectrogram  analysis  helped  point  to  kinesthetic  synchronization  as  a  key  method  of 
resolution  for  these  problems.  Kinesthetic  synchronization  involves  teaching  singers  to 
move  their  mouths  and  vocal  articulators  in  the  same  way  and  at  the  same  time.  This 
solution  is  compared  with  current  popular  teaching  strategies  for  choral  diction, 
demonstrating  many  of  these  popular  methods  to  be  ineffective  and  even  counter¬ 
productive. 

In  order  to  provide  additional  methods  for  improving  text  intelligibility,  the  paper 
also  discusses  ways  in  which  choirs  can  directly  assist  the  audience  in  understanding  the 
words.  In  addition,  there  is  a  brief  discussion  of  possible  acoustical  enhancements  of 
performance  spaces  for  improved  clarity  of  text. 


' 


Acknowledgements 


The  author  wishes  to  acknowledge  Dr.  Laurier  Fagnan  for  his  support  and  interest 
in  this  research,  and  his  generosity  with  his  expertise  and  resources  which  were  vital  to 
this  project.  Thank  you  for  asking  and  answering  important  questions,  and  for  always 
making  time  for  me. 

Dr.  Debra  Cairns,  I  knew  I  was  always  in  good  hands  with  you  as  my  supervisor. 
You  had  all  the  details  worked  out,  and  you  cared  enough  to  also  be  concerned  for  my 
personal  well-being  in  this  process. 

I  would  like  to  especially  thank  Kiyoshi  Kuroiwa,  who  took  the  time  to  run  the 
CATT  Acoustic  convolutions  for  the  computer  simulation  portion  of  this  thesis.  Thank 
you,  also,  to  John  O'Keefe  who,  along  with  Kiyoshi,  provided  the  valuable  insight  and 
advice  that  only  an  acoustician  can  give. 

To  my  dear  husband,  Jonathan:  thank  you  for  loving  me  through  the  long  hours, 
for  delicious  dinners  and  frozen  treats  to  spur  me  on,  and  for  always  believing  that  I  could 
complete  my  work  with  excellence.  Thank  you  for  blazing  a  trail  for  me,  for  your  advice, 
and  for  your  kindness.  And  to  our  dear  sweet  little  Sammy,  thank  you  for  being  such  a 
good  baby  boy,  not  being  too  fussy  or  demanding  so  that  mommy  could  still  finish  her 
Masters. 

Thank  you  to  my  creator,  my  saviour,  my  friend,  for  providing  everything  that  I 
need,  for  all  of  these  previously  mentioned  people  that  have  been  so  wonderful,  for 
making  these  plans  for  me,  and  for  seeing  it  through  to  completion. 


- 


Table  of  Contents 


Chapter  1  -  Introduction .  1 

Methodology 

Contribution 

Chapter  2  -  Acoustics . 5 


Reverberation  Time 
Anechoic  Chamber 
Acoustic  Strength  or  Loudness 
Early  and  Late  Reflected  Energy 
Clarity 

Summary  of  Effect  of  Acoustical  Factors  on  Text  Intelligibility 


Chapter  3  -  Acoustical  Modeling  and  Listener  Feedback . 11 

CATT  Modeling 
Listener  feedback 

Chapter  4  -  Analysis:  Diction  Problems  and  Solutions . 21 


Spectrograms 

Plosives 

Nasals 

Fricatives 

Approximants 

Summary 

Supporting  Research  on  Kinesthetic  Synchronization 


Chapter  5  -  Other  Considerations  for  Text  Intelligibility . 48 

Chapter  6  -  Conclusion  and  Future  Work .  51 

Bibliography .  54 

Appendix  A  -  CATT  Acoustic  Modeling  and  Predictions .  56 

Three-Dimensional  Models 
Results  of  Computer  Simulation 

Appendix  B  -  Acoustic  Modeling  Listening  Examples .  60 

Appendix  C  -  Listener  Feedback . 61 

Feedback  form 
Raw  data 
Graphs 


List  of  Tables 


Table  2.1 

Reverberation  Time  of  Various  Performance  Spaces 

Table  3.1 

Seating  Capacity  and  Volume  of  Three  Performance  Venues 

Table  3.2 

Acoustical  Characteristics  of  Four  Performance  Spaces, 
as  determined  by  CATT  Acoustic 

Table  3.3 

Phrase  and  Venue  Combinations  for  Each  Listener  Set 

Table  3.4 

Intelligibility  Results  of  Listener  Feedback  for  Five  Performance  Spaces 

Table  4.1 

Intelligibility  of  Consonant  Sounds  in  Various  Performance  Venues 

Table  4.2 

Overview  of  Common  Diction  Problems  and  Solutions 

List  of  Figures 


Figure  2. 1 

Anechoic  Chamber  of  Canada’s  NRC 
*  refer  to  track  1,  Appendix  B  for  anechoic  choral  recording 

Figure  3.1 

Simon  Fraser  University  Experimental  Theatre 

mapped  in  three  dimensions  in  CATT  Acoustic,  viewed  from  above 

Figure  3.2 

Queen  Elizabeth  Theatre,  Vancouver  BC,  as  modeled  in  CATT  Acoustic, 
viewed  from  stage 

*  refer  to  track  3,  Appendix  B  for  simulated  choral  performance  in  QET 

Figure  3.3 

Esplanade  Theatre,  Medicine  Hat  AB  as  modeled  in  CATT  Acoustic, 
viewed  from  stage 

*  refer  to  track  2,  Appendix  B  for  simulated  choral  performance  in 
Esplanade  Theatre 

Figure  3.4 

Simon  Fraser  University  Experimental  Theatre,  Vancouver  BC  as  modeled 
in  CATT  Acoustic,  viewed  from  stage,  without  curtains  (right)  and  with 
curtains  (left) 

*  refer  to  frack  4  and  5,  Appendix  B  for  simulated  choral  performance  in 
Experimental  Theatre,  without  curtains  and  with  added  curtains, 
respectively 

Figure  4.1 

Spectrogram  of  “sigh”  [sai  ]  spoken  by  a  single  voice 

Figure  4.2 

Spectrogram  of  “buy,  die,  guy”  [bai  dai  gai  ]  spoken  by  a  single  voice 

Figure  4.3 

Spectrogram  of  St.  Olaf  Cantorei  choir  singing  “speaks  such”  [spik  sa  tj  ] 
on  the  left,  in  anechoic  chamber;  on  the  right,  in  the  Queen  Elizabeth 
Theatre 

Figure  4.4 

Spectrogram  of  “ram,  ran,  rang,”  [rasm  raen  raep]  spoken  by  a  single  voice 

Figure  4.5 

St.  Olaf  Cantorei  Choir  singing  “son”  [sa  n]  in  anechoic  chamber 

Figure  4.6 

St.  Olaf  Cantorei  Choir  singing  “son”  [sa  n]  in  SFU’s  Experimental 
Theatre  without  curtains 

Figure  4.7 

Spectrogram  of  “fie,  thigh,  sigh,  shy”  [fai  9ai  sai  J  ai  ]  spoken  by  a 
single  voice 

Figure  4.8 

Spectrogram  of  “this  who  speaks  such”  [6i  s  hu  spiks  sa  tj  ]  sung  by  St. 
Olaf  Cantorei  Choir  in  anechoic  chamber 

Figure  4.9 

Spectogram  of  “this  who  speaks  such”  ”  [6i  s  hu  spiks  sa  tf  ]  sung  by  St. 
Olaf  Cantorei  Choir  in  Medicine  Hat  Esplanade  Theatre 

Figure  4. 10 

Spectrogram  of  “why,  lie,  rye”  [wai  lai  rai  ]  spoken  by  a  single  voice 

Figure  4.11 

Spectrograms  of  “some  prophet”  [sa  m  pro  fi  t]  sung  by  St.  Olaf  Cantorei 
Choir.  On  the  left,  in  anechoic  chamber.  On  the  right,  in  Medicine  Hat’s 
Esplanade  Theatre. 

Chapter  1  -  Introduction 


In  the  world  of  choral  music,  communication  with  the  audience  is  one  of  the 
central  goals.  A  key  part  of  that  communication  is  the  ability  of  the  audience  to 
understand  the  words  being  sung.  Creating  clear  diction  is  a  challenge  that  all  choirs  have 
in  common  and  an  objective  towards  which  they  all  strive.  The  difficulty  of 
communicating  words  to  the  audience  is  compounded  not  only  by  the  multiplicity  of 
voices  and  parts  in  a  choral  context,  but  also  by  the  acoustical  effects  of  performance 
spaces. 

The  combined  study  of  choral  music  and  acoustics  has  thus  far  received  relatively 
little  attention  from  researchers.  However,  since  choral  music  making  involves  millions 
of  people  worldwide,  and  since  communication  is  one  of  the  key  aims  of  choral 
musicians  and  conductors,  such  an  exploration  is  timely  and  worthwhile. 

In  the  field  of  choral  music  making,  there  are  many  misconceptions  about  how 
acoustics  can  affect  a  musical  performance.  In  this  essay,  the  topic  of  choral  diction  will 
be  explored  from  an  acoustical  perspective  in  order  to  answer  the  following  questions: 
which  consonant  sounds  in  performance  halls  are  most  problematic;  and  from  a 
conductor’s  viewpoint,  what  are  the  best  ways  of  instructing  a  choir  in  order  to  improve 
the  diction?  This  study  will  help  to  inform  choral  conductors,  choristers,  and  even  choral 
composers,  about  the  nature  of  acoustics  and  text  intelligibility,  and  thus  lead  to  new  and 
important  approaches  to  choral  diction. 

Many,  if  not  all,  choral  conductors  have  asked  their  singers  for  clearer  diction  at 
one  time  or  another.  The  difficulty  with  choral  diction  is  not  just  the  fact  that  so  many 
voices  need  to  sing  in  unison.  Performance  hall  acoustics  also  add  to  the  loss  of  clarity. 


1 


. 


Unfortunately,  even  when  perfectly  clear  diction  is  produced  by  a  choir,  much  of  this 
clarity  is  lost  in  a  large  conceit  hall.  This  disappointing  fact  about  room  acoustics  will  be 
objectively  demonstrated  by  artificially  generating  identical  performances  of  certain 
choral  excerpts  in  rooms  of  various  size  and  shape. 

Although  clear  diction  is  not  easily  attainable  in  a  large  performance  space,  it  is 
still  the  goal  of  any  choir  to  achieve  as  much  clarity  as  possible.  The  most  common 
methods  used  to  achieve  clear  diction  usually  involve  exaggeration  of  consonant 
articulation  or  an  increase  in  volume  of  consonant  sounds.  Though  these  methods,  and 
others,  are  helpful,  many  do  surprisingly  little  to  clarify  text,  and  in  some  cases  further 
exacerbate  the  problem.  I  will  argue  that  in  order  to  overcome  the  detrimental  effects  of 
room  acoustics  on  choral  diction,  choirs  must  learn  kinesthetic  synchronization  of  the 
vocal  articulators:  that  is,  all  singers  must  learn  to  move  the  mouth,  lips,  tongue,  etc.  in 
unison,  and  all  in  the  same  way. 

Methodology 

Professional  room  acoustics  software  called  CATT  Acoustic1  will  be  used  to 
model  the  dimensions  and  acoustical  properties  of  four  different  conceit  halls  of  specific 
size,  shape  and  acoustical  characteristics. 

For  the  purposes  of  this  study,  we  will  use  a  recording  produced  by  Wenger 
Engineering  Ltd.2  This  recording  features  the  St.  Olaf  Cantorei  choir  performing  in  an 


1  CATT  Acoustic  stands  for  ‘Computer  Aided  Theatre  Technique’.  It  combines  CAD  design  software  with 
computer  aided  acoustics  auralization  and  prediction. 

2  Anechoic  Choral  Recordings.  St.  Olaf  Cantorei  Choir,  conducted  by  Dr.  John  Ferguson.  Recorded 
October  21,  2003.  Owatonna,  MN:  Wenger  Corporation,  2004.  1  compact  disc,  1  data  dvd. 


2 


an  echoic  chamber.  ’  A  modem,  sacred,  English  piece  entitled  “Who  Is  This”  was  chosen 
for  acoustical  modeling  because  of  its  simplicity  (few  overlapping  lines  or  polyphony), 
and  simple  English  text.  Using  this  anechoic  recording,  the  computer  will  simulate  the 
overall  sound  of  the  piece  as  it  would  be  heard  by  an  audience  member  within  each  of 
four  selected  concert  halls:  Vancouver’s  Queen  Elizabeth  Theatre,  Medicine  Hat’s 
Esplanade  Theatre,  and  Simon  Fraser  University’s  Experimental  Theatre,  with  and 
without  curtains.  These  simulations,  as  well  as  the  original  recording  in  the  anechoic 
chamber,  will  then  be  played  in  varied  orders  for  various  independent  listeners  (both 
musicians  and  non-musicians)  for  their  feedback  on  textual  clarity.  The  common  diction 
problems,  as  identified  by  the  listeners,  will  then  be  visually  analysed  using 
spectrograms,  a  computer- generated  visual  representation  of  speech.  The  spectrograms 
will  reveal  how  and  why  the  acoustical  characteristics  affect  diction  clarity,  and  will  help 
to  provide  new  teaching  methods  specifically  designed  to  overcome  diction  problems  that 
result  from  concert  hall  acoustics. 

Contribution 

This  research  has  the  potential  to  positively  impact  choirs  and  contribute 
significantly  to  choral  education  techniques.  It  will  inform  conductors  of  some  acoustic 
realities  that  have  a  negative  effect  on  diction,  and  provide  them  with  some  pedagogical 
principles  to  counteract  these  natural  pitfalls,  thereby  enhancing  the  musical  experience 
for  audience  members.  The  conclusions  presented  in  this  essay  will  have  the  potential  to 
change  current  standard  approaches  to  choral  diction. 


3  See  page?  for  definition  of  anechoic  chamber. 


3 


Chapter  two  provides  a  brief  discussion  of  acoustics,  the  reasons  for  a  warm  and 
reverberant  choral  sound,  and  how  this  reverberance  competes  with  diction  clarity.  Some 
important  acoustical  terms  will  be  introduced. 

Chapter  three  presents  an  overview  of  the  computer  modeling  technology  that 
simulates  the  choral  performance  in  various  acoustical  spaces. 

In  chapter  four,  listener  feedback  will  help  to  identify  which  types  of  consonant 
sounds  are  most  problematic.  Visual  graphs  of  speech,  called  spectrograms,  will  help  to 
identify  reasons  for  these  diction  problems  in  various  acoustical  spaces. 

Chapter  five  provides  an  impression  of  diction  from  the  audience’s  perspective 
and  discusses  means  by  which  the  listener  can  be  aided  in  understanding  the  choral  text. 
It  also  deals  with  acoustical  considerations  such  as  room  size  and  absorption,  placement 
of  choir  and  audience  within  the  room,  layout  of  the  choir,  and  the  use  of  choir  shields 
and  other  reflectors.  This  chapter  will  serv  e  as  a  launching  point  to  further  research  in  the 
field  of  choral  acoustics. 


4 


Chapter  2  -  Acoustics 


Choirs  tend  to  perform  in  large,  reverberant  halls  for  many  reasons:  richness  of 
sound,  aesthetic  beauty,  and  space  for  the  audience,  to  name  just  a  few.  Although  large 
concert  halls,  cathedrals  and  other  reverberant  performance  venues  are  beautiful  and  add 
to  the  musical  experience  in  many  ways,  these  venues  are  no  friend  of  choral  diction. 
Before  discussing  the  effects  of  acoustics  on  choral  diction,  some  key  principles  of 
acoustics  that  come  into  play  with  choral  diction  will  first  be  addressed,  many  of  which 
tend  to  be  misunderstood  by  musicians  and  audience  members. 

Reverberation  Time  (RT) 

When  people  think  of  large  concert  halls  and  cathedrals,  they  often  use  the  term 
Teverberance’  to  describe  the  sound  that  lingers  after  the  original  sound  is  produced. 
Reverberation  time,  RT,  is  one  of  the  most  important  acoustical  factors  in  any  live 
performance  venue.  RT  is  defined  as  the  length  of  time  from  when  a  sound  is  first 
produced  until  it  become  inaudible.  Technically  speaking,  it  is  the  length  of  time  it  takes 
for  a  sound  to  decrease  by  60  dB  (or  sound  power  decreasing  by  a  factor  of  600).4 

Sound  waves  that  are  produced  within  a  room  will  continue  to  bounce  freely 
around  the  room,  decreasing  in  energy  as  they  become  absorbed  by  objects  within  the 
room.  This  means  that  smaller  rooms  with  more  soft,  absorptive  surfaces  will  attenuate 
sound  energy  more  quickly  (and  have  a  lower  RT),  and  rooms  that  are  larger  with  harder 
surfaces  will  take  longer  to  absorb  the  energy  (higher  RT),  since  the  sound  waves  take 

4 

Murray  Campbell,  "Reverberation  time"  in  Grove  Music  Online  (Oxford  University 
Press,  2007-)  accessed  June  19,  2012, 

http://www.oxfordmusiconline.com.  login.ezpro  xy.  library,  ualbeita.ca/subscriber/ article/grove/music/23282. 


5 


longer  to  reach  the  room’s  surfaces.  Rooms  with  a  larger  RT  are  said  to  have  a  ‘wet’, 
‘live’,  or  ‘reverberant’  acoustic,  whereas  rooms  with  a  smaller  RT  have  a  ‘dry’  or  ‘dead’ 
acoustical  quality.  The  ideal  RT  of  a  room  depends  on  the  intended  use  of  the  space.  A 
drama  theatre,  for  example,  might  have  an  RT  of  0.9  seconds,  while  a  large  concert  hall 
would  typically  have  an  RT  of  just  over  2  seconds,  and  a  large  cathedral  of  over  1 0 
seconds.* * 5 6 


Table  2.1  -  Reverberation  Time  of  Various  Performance  Spaces6 


Location 

RT  (s) 

Outdoors 

0.0 

Average  Living  room  or  Bedroom 

0.4 

Theatre  for  Speech 

0.9 

Edmonton’s  Jubilee  Auditorium 

1.4 

Toronto’s  Roy  Thomson  Hall 

1.8 

Vancouver’s  Queen  Elizabeth  Theatre 

1.8 

Edmonton’s  Winspear  Centre7 

2.2 

St.  Paul’s  Cathedral 

13 

Another  important  aspect  of  reverberation  is  frequency.  RT  values,  such  as  those 
given  in  Table  2.1,  are  generally  based  on  mid-range  frequencies  within  the  room  (500 
Hz  or  1  kHz).  However,  there  is  a  whole  spectrum  of  sound  frequencies  below  and  above 
this  range,  and  each  frequency  has  its  own  RT.  Higher  and  lower  range  RT  values  would 
be  consulted  in  order  to  determine  the  source  of  acoustical  anomalies.  Since  we  will  be 
making  use  of  professional  halls  for  the  simulations  of  this  research,  we  need  not  be 


Bridget  Shield  and  Trevor  Cox,  “Concert  Hall  Acoustics:  Arts  and  Science”  from 

University  of  Salford  Manchester,  Acoustics,  1999-2000, 

http://www.acoustics.salford.ac.uk/acoustics_  info/concert  hall  acoustics/?content=  reverb . 

6  Ibid.  ~ . " 

7  Timothy  J.  Foulkes,  Halls  for  Music  Performance:  Another  Two  Decades  of  Experience  1982  -  2002 
(Melville  NY :  Acoustical  Society  of  America,  2003),  1 50. 


6 


concerned  with  RT  in  high  or  low  frequency  range.  The  mid-frequency  RT  values  will 
give  a  good  overall  reverberation  time  for  our  purposes. 

The  construction  materials  of  a  concert  hall  have  a  huge  impact  on  reverberation. 
Most  performance  spaces  are  made  of  wood  or  concrete  or  other  hard  surfaces  in  order  to 
reflect  sound  and  increase  reverberation.  Some  venues  have  curtains  that  can  be  retracted 
in  order  to  reduce  RT  for  increased  musical  or  textual  clarity.  Seats  are  usually  made  of 
absorptive  materials  in  order  to  mimic  the  absorption  of  human  beings,  which  helps  to 
minimize  the  acoustic  difference  between  an  empty  and  full  hall. 

Reverberation  is  a  very  desirable  characteristic  for  a  live  music  performance 
venue  because  it  adds  a  pleasant  quality  of  richness  to  the  sound.  This  study  will  show, 
however,  that  while  reverberation  is  helpful  for  the  overall  musical  experience,  it  is  a 
detriment  to  clear  diction.  An  appropriate  balance  between  RT  and  clarity  must  be  found 
for  ideal  choral  performances  that  are  both  aesthetically  beautiful  and  textually  clear. 

Anechoic  Chamber 

An  anechoic  chamber  is  a  room  designed  to  have  as  little  reverberation  as 
possible  (an  RT  as  close  to  zero  as  possible)  through  the  use  of  highly  absorptive 
materials  on  all  surfaces  (ceiling,  walls  and  floor)  and  is  an  important  scientific  tool  for 
acoustical  research. 


7 


Figure  2.1  -  Anechoic  Chamber  of  Canada’s  NRC8 


The  purpose  of  such  a  room  is  to  provide  the  necessary  conditions  for  an  echo- 
free  recording  that  can  be  used  for  modeling  the  sound  in  other  acoustical  spaces,  as  well 
as  for  analyzing  sound  as  it  is  produced,  without  the  addition  of  any  reflected  sound. 
Further  information  on  the  acoustical  modeling  using  an  anechoic  chamber  will  be 
provided  in  chapter  three  in  the  discussion  of  computer  simulation  of  acoustical  spaces. 

Acoustical  Strength  or  Loudness  (G) 

Loudness,  or  Acoustical  Strength,  G,  is  another  important  factor  to  consider  in 
choral  performances.  G  is  a  relative  quantity,  in  dB,  which  compares  the  amount  of  sound 
energy  that  the  listener  perceives  within  a  certain  acoustical  space  with  the  sound  energy 
that  would  be  heard  ten  meters  from  the  same  source  in  an  anechoic  chamber.8 9  In  general, 
higher  acoustical  strength  makes  words  easier  to  decipher,  just  as  when  someone  speaks 
more  loudly,  it  helps  them  to  be  understood  more  clearly.  Acoustical  strength  is  usually 


8  Anechoic  Chamber  of  National  Research  Council  of  Canada.  Credit:  NRC-CNRC  Harry  Turner. 
http://www.nrc-cnrc.gc.ca/eng/multimedia/anechoic-chamber.htm 

9  Leo  Beranek,  Concert  and  Opera  Halls:  How  They  Sound  (Woodbury,  NY :  American  Institute  of 
Physics,  cl 996),  30. 


8 


higher  in  smaller  performance  halls,  and  in  halls  with  very  hard  surfaces  where  there  is  a 
greater  propensity  for  reflected  sound  energy. 

Early  and  Late  Reflected  Energy 

‘Early  energy’  refers  to  the  amount  of  sound  energy  that  arrives  at  the  ear  soon 
after  the  sound  is  produced  (within  80  milliseconds).  Early  energy  includes  direct  energy, 
sound  waves  that  travel  directly  from  the  source  to  the  receiver  without  being  reflected  at 
all,  and  sound  waves  that  are  reflected  off  of  one  or  perhaps  two  relatively  near  surfaces 
before  they  reach  the  listener. 

‘Late  reflected  energy’  refers  to  sound  energy  that  takes  longer  to  travel  from  the 
source  to  the  receiver  because  it  has  been  reflected  several  times,  or  has  bounced  around 
the  room  off  various  objects  before  reaching  the  listener.  The  proportion  of  early  and  late 
energy  in  the  total  sound  energy  is  important  in  determining  the  clarity  of  the  sound.  If 
the  majority  of  the  sound  energy  that  the  listener  detects  is  ‘early  energy’,  the  text  will  be 
much  easier  to  decipher.  On  the  other  hand,  if  the  sound  reaching  the  listener  contams  no 
‘late  energy’,  the  sound  will  be  very  dry,  lacking  the  acoustical  richness  of  reverberant 
sound. 

Clarity  (C-80) 

Clarity  in  concert  halls  is  a  measure  of  the  perceived  clarity  of  sound.  It  is  a  ratio 
of  the  early  reflected  energy  to  the  late  reflected  energy.  Technically  speaking,  it  is  a  ratio 
of  the  sound  energy  arriving  within  the  first  80  milliseconds  (ms)  to  the  sound  energy 
arriving  at  the  listener  after  80  ms.  According  to  Beranek,  any  sounds  arriving  at  the 


9 


listener  within  80  ms  of  one  another  are  grouped  together  by  the  brain  as  a  single  sound 
event.10  However,  sound  arriving  after  80  ms  contributes  to  noise,  and  muddying  of  the 
sound.  Clarity  is  higher  if  most  of  the  energy  reaching  the  listener  is  direct  or  early 
reflected  energy.  The  later  energy,  from  late  reflections,  obscures  the  clarity  of  music  and 
text,  but  adds  to  reverberation  and  acoustic  warmth.  A  clarity  value  greater  than  or  equal 
to  ‘one’  is  best  for  textual  intelligibility. 

Summary  of  Effect  of  Acoustical  Factors  on  Text  Intelligibility 

Diction  is  positively  affected  by  G  (acoustical  strength)  and  negatively  affected 
by  RT  (reverberation  time).  Textual  intelligibility  is  measured  by  C-80  (Clarity).  Direct 
sound  and  early  reflections  are  helpful  for  text  intelligibility,  while  late  reflections  are  a 
hindrance.  Clarity  of  sound  is  further  hindered  in  choral  singing  because  of  multiple 
voices  that  may  individually  be  singing  in  slightly  different  ways  and  at  slightly  different 
times.  Reverberation  further  reduces  clarity  by  reflecting  the  sound  and  scattering  it 
through  time,  adding  to  unintelligible  noise  that  becomes  part  of  the  listening  experience. 

One  of  the  primary  goals  of  any  choral  performance  is  to  communicate  text 
clearly  to  the  audience.  The  performance  is  enhanced  by  the  acoustical  beauty  of  more 
reverberant  sound.  Our  goal  as  musicians,  then,  is  to  find  methods  of  improving  the 
clarity  of  text  by  working  with,  rather  than  against,  the  acoustical  properties  of  the 
performance  space,  and  thus  achieve  an  effective  and  satisfying  balance  in  order  to 
experience  both  textual  clarity  and  acoustical  warmth  and  beauty. 


10  Beranek,  23. 


10 


Chapter  3  -  Acoustical  Modeling  &  Listener  Feedback 

Much  rehearsal  time  and  energy  is  often  spent  training  choirs  to  sing  the  text  more 
clearly  with  the  goal  of  clear  communication.  In  the  fields  of  choral  conducting  and 
education,  there  is  a  relatively  small  inventory  of  techniques  for  improving  diction,  most 
of  which  are  not  scientifically  based  or  tested. 

hi  order  to  assess  the  effectiveness  of  various  diction  improvement  techniques,  the 
influence  of  various  acoustical  conditions  on  choral  diction  needs  to  be  taken  into 
consideration.  With  the  expertise  and  resources  of  Aercoustics  Engineering  Ltd.,1 1 
various  computer  models  were  created  using  CATT  Acoustic  professional  room  acoustics 
software.  This  software  uses  three-dimensional  modeling  to  simulate  how  music  sounds 
in  a  specific  space. 

CATT  Modeling 

The  first  step  in  using  the  acoustical  modeling  software,  CATT  Acoustic,  is  to 
have  the  programmer  map  out  the  room  in  three  dimensions  and  define  its  surface 
properties  (i.e.  reflection/absorption  characteristics  of  walls,  chairs,  ceiling,  and  floors). 
Figure  3.1  below  shows  the  Simon  Fraser  University  Experimental  Theatre  in  three 
dimensions,  as  modeled  in  CATT  Acoustic.  The  darker  red  regions  show  soft,  absorptive 
seating;  the  grey  area  is  hard  flooring;  the  bright  red  surfaces  are  curtains,  used  both  on 
the  stage  and  on  the  wails  of  the  auditorium  to  absorb  sound  energy. 


11  Aercoustics  Engineering  Ltd.,  Toronto  ON.  Kiyoshi  Kuroiwa  and  John  O’Keefe,  acoustical  consultants. 


11 


■ 


Figure  3.1  -  Simon  Fraser  University  Experimental  Theatre 
mapped  in  three  dimensions  in  CATT  Acoustic,  viewed  from  above 

The  programmer  must  then  define  a  specific  location  for  the  sound  source 
(perhaps  centre  stage,  seen  as  a  small  black  dot  on  the  bottom  right  side  of  the  diagram), 
and  a  location  for  the  receiver  or  listener.  (Note  that  the  centre  of  the  stage  is  located 
where  the  x  and  y  axes  intersect  on  the  bottom  right.)  This  model  includes  seven  possible 
receiver  locations  in  various  positions  on  the  far  side  of  the  audience  area. 

The  acoustical  spaces  chosen  for  the  simulations  in  this  study  are  Vancouver’s 
Queen  Elizabeth  Theatre,  The  Medicine  Hat  Esplanade  Theatre,  Simon  Fraser 
University’s  Experimental  Theatre  without  curtains,  and  the  same  Experimental  Theatre 
with  added  curtains  for  maximum  sound  absorption.  These  spaces  were  chosen  as  typical 
performance  halls  in  Canada,  with  a  variety  of  acoustical  characteristics.  The  respective 
sizes  of  the  three  halls  shown  above  are  as  follows: 

Table  3.1  -  Seating  Capacity  and  Volume  of  Three  Performance  Venues 


Theatre 

Seating 

Volume 

Queen  Elizabeth  Theatre, 
Vancouver  BC 

2,750 

31,915  m3 

Esplanade  Theatre, 
Medicine  Hat  AB 

700 

12,154  m3 

SFU  Experimental  Theatre, 
Vancouver  BC 

400 

3,855  m3 

12 


These  spaces  were  all  mapped  in  three  dimensions  as  shown: 


Figure  3.2  -  Queen  Elizabeth  Theatre,  Vancouver  BC,  as  modeled  in  CATT  Acoustic, 

viewed  from  stage 

The  Queen  Elizabeth  Theatre  is  shown  in  figure  3.2  with  three  balconies12,  seats 
in  dark  red,  ceiling  and  stage  walls  in  dark  grey,  and  walls  and  flooring  in  white.  The 
figure  also  shows  ceiling  reflectors  in  shades  of  grey. 


12 

Note  that  the  Queen  Elizabeth  Theatre  is  not  constructed  as  shown  in  this  model.  This  three-dimensional 
drawing  was  the  suggested  layout  for  the  QET  upgrade,  but  construction  was  completed  with  just  one 
balcony. 


13 


Figure  3.3  -  Esplanade  Theatre,  Medicine  Hat  AB  as  modeled  in  CATT  Acoustic, 

viewed  from  stage 


Figure  3.4  -  Simon  Fraser  University  Experimental  Theatre,  Vancouver  BC  as  modeled  in 
CATT  Acoustic;  viewed  from  stage,  without  curtains  (left)  and  with  curtains  (right) 

In  figure  3.4,  the  model  on  the  right  includes  curtains  on  the  back  and  side-walls 
as  well  as  on  the  stage,  while  the  model  on  the  left  has  curtains  on  the  stage  only.  The 
two  different  models  will  be  used  to  compare  text  intelligibility  for  a  room  with  the  same 
size  and  shape  but  with  different  surface  properties. 

Once  the  three-dimensional  models  are  complete,  the  software  can  then  run  an 
accurate,  computer-generated  simulation,  sending  sound  waves  of  various  frequencies  out 
from  the  source  in  all  directions.  The  resulting  reflection  and  absorption  of  sound  waves 


within  the  room  determines  the  acoustical  response  of  the  room  and  the  specific 
acoustical  conditions  at  each  receiver  location.13 

The  acoustical  parameters  for  these  four  performance  spaces,  as  determined  by 
CATT  Acoustic’s  sound  wave  simulation  at  the  receiver  location  chosen  for  each  space, 
are  shown  in  table  3.2. 14  As  expected,  the  highest  RT  value  occurs  in  the  large  Queen 
Elizabeth  Theatre,  and  the  smallest  occurs  in  the  Experimental  Theatre  with  curtains.  The 
G  and  C80  values  correlate  inversely  with  room  size.  For  the  Experimental  Theatre,  G  is 
reduced  and  C80  is  increased  with  added  absorption  (curtains).  This  makes  sense,  since 
more  absorption  will  reduce  reflections,  making  the  sound  quieter  and  the  clarity 
improved  due  to  minimized  reverberation. 


Table  3.2  -  Acoustical  Characteristics  of  Four  Performance  Spaces, 
as  determined  by  CATT  Acoustic 


RT(s) 

Reverberation  Time 

G(dB) 

Strength 

C80  (dB) 

Clarity 

Medicine  Hat’s  Esplanade 
Theatre 

1.6 

6 

2.8 

Vancouver’s  Queen 
Elizabeth  Theatre 

2.0 

1 

2.9 

SFU’s  Experimental 

Theatre,  with  curtains 

0.7 

6 

8.0 

SFU’s  Experimental 

Theatre,  without  curtains 

1.2 

7 

3.0 

The  final  programming  step  is  to  combine  this  generic  acoustical  response 
information  with  an  anechoic  music  sample  in  order  to  mimic  how  the  music  would 


13  For  the  purpose  of  this  study,  one  receiver  location  was  chosen  in  a  central  position  of  the  orchestra  level 
for  each  room.  Such  a  location  is  considered  a  typical  listener  location  since  it  is  in  the  centre  of  a  group  of 
seats  and  is  not  close  to  any  walls  or  other  reflective  surfaces. 

14  Values  given  for  each  parameter  are  averaged  for  500  Hz  and  1  kHz  which  represent  mid-range 
frequencies.  Parameters  are  determined  for  a  representative  listener  location  within  the  audience  area. 


15 


sound  if  played  in  this  particular  performance  venue.15  For  the  purposes  of  this  study,  an 
anechoic  recording  produced  by  Wenger  Engineering  was  chosen  as  the  test  piece.  The 
work  is  a  relatively  unfamiliar  sacred  English  choral  composition  entitled  “Who  Is  This”, 
by  John  Ferguson,  and  is  performed  by  the  80-voice  St.  Olaf  Cantorei  choir.  This 
performance  is  of  the  highest  quality,  given  that  the  St.  Olaf  Cantorei  Choir  is  a  well- 
known  professional-level  choir.  The  selection  of  the  obscure  English  sacred  piece  was 
intentional  in  order  to  ensure  that  the  listener  data  was  not  skewed  by  familiarity  with  the 
words  being  sung.  At  the  end  of  the  computer  simulation  process,  four  sound  files  of  this 
choral  piece  were  produced,  each  one  designed  to  simulate  the  aural  experience  of  an 
audience  member  sitting  in  each  of  the  four  performance  spaces.16 


Listener  Feedback 

The  resulting  sound  files  were  then  taken  to  various  listeners  for  their  feedback 
regarding  text  intelligibility.  Five  sets  of  listeners  were  sampled,  each  with  six  listeners, 
making  a  total  of  thirty  listeners.  All  listeners  heard  the  same  six  short  phrases  of  music 
played  in  the  same  order.  Each  phrase  used  one  of  the  simulated  performance  venue 
samples  or  the  original  anechoic  chamber  recording.  Each  listening  group  was  given  a 
different  combination  of  performance  hall  simulations  according  to  table  3.3.  Each 
listener  heard  the  samples  individually,  and  the  same  set  of  headphones  was  used  for  all 
listeners.17  The  listeners  included  a  random  sampling  musicians  and  non-musicians 


13  The  audio  file  used  in  conjunction  with  CATT  acoustic  must  be  a  sound  file  created  in  an  anechoic 
chamber  -  a  room  devoid  of  reverberation. 

16  The  number  of  listening  excerpts  chosen  was  simply  a  function  of  the  number  of  unique  phrases 
available  from  the  chosen  repertoire. 

17  Headphones  used  for  the  study  were  A  KG  model  K240. 


16 


. 


across  various  age  groups  and  social  backgrounds  in  order  to  represent  a  typical  choral 
concert  audience. 


Table  3.3  -  Phrase  and  Venue  Combinations  for  Each  Listener  Set 


Anechoic 

Chamber 

Esplanade 

Theatre 

Queen 

Elizabeth 

Theatre 

SFU 

Theatre 

without 

curtains 

SFU 

Theatre 

with 

curtains 

Phrase  1  “Who  is  this 
who  walks  among 
us” 

Listener  Set 

1 

Listener  Set 
2 

Listener  Set 
3 

Listener  Set 
4 

Listener  Set 
5 

Phrase  2 

“who  is  this  who 
speaks  such  ...” 

Listener  Set 
5 

Listener  Set 
1 

Listener  Set 
2 

Listener  Set 
3 

Listener  Set 
4 

Phrase  3  “is  it  Moses 
or  Elijah  or  some 
prophet  of  the  Lord” 

Listener  Set 
4 

Listener  Set 
5 

Listener  Set 
1 

Listener  Set 
2 

Listener  Set 
3 

Phrase  4  “...suffering 
servant” 

Listener  Set 
3 

Listener  Set 
4 

Listener  Set 
5 

Listener  Set 
1 

Listener  Set 
2 

Phrase  5  “can  we 
name  the  promised 
Son” 

Listener  Set 
2 

Listener  Set 
3 

Listener  Set 
4 

Listener  Set 
5 

Listener  Set 

1 

Phrase  6  “can  we 
name  the  heir  of 

David” 

Listener  Set 

1 

Listener  Set 
2 

Listener  Set 
3 

Listener  Set 
4 

Listener  Set 
5 

17 


The  cumulative  results  of  the  participants’  listening  feedback  are  shown  in  table  3.4. 


Table  3.4  -  Intelligibility  Results  of  Listener  Feedback  for  Five  Performance  Spaces 


Performance  Space 

Text  Intelligibility 

Percentage  of  Words 
Correctly  Identified 

Vowel  Intelligibility 

-  Percentage  of  vowel 
sounds  correctly  or 
approximately 
identified 

Consonant 

Intelligibility 

Percentage  of 
consonants  or 

consonant  clusters 
correctly  or 
approximately 
identified 

Anechoic  chamber 

65% 

80% 

68% 

Esplanade  Theatre 

42% 

64% 

46% 

QET 

29% 

55% 

35% 

SFU  (curtains) 

45% 

68% 

48% 

SFU  (no  curtains) 

40% 

66% 

49% 

Figure  3.5  -  Intelligibility  of  Text,  Vowels  and  Consonant  Clusters  in  Five  Different 
Performance  Venues  According  to  Listener  Feedback18 


In  general,  increased  reverberation  time  results  in  decreased  intelligibility.  The 


anechoic  chamber  recording,  for  example,  yielded  a  much  higher  text  intelligibility  than 


18  Refer  to  Appendix  C  for  detailed  data  on  listener  feedback. 


18 


any  of  the  other  spaces.  This  result  is  to  be  expected  since  the  anechoic  recording 
provides  the  listener  with  direct  sound  only,  whereas  the  other  recordings  include  delayed 
reflected  sound,  thus  reducing  acoustic  clarity  and  therefore  intelligibility.  As  expected, 
intelligibility  was  slightly  increased  with  the  addition  of  curtains  in  the  SFU 
Experimental  Theatre,  since  reverberance  is  reduced  and  clarity  increased  in  this 
configuration.  The  1%  reduction  of  consonant  intelligibility  with  the  addition  of  curtains 
is  negligible,  and  certainly  within  the  margin  of  error  for  the  experiment. 

The  overall  correlation  between  acoustical  characteristics  and  intelligibility  are  as 
follows: 

•  Increased  RT  (reverberation  time),  though  it  increases  acoustical  richness 
and  beauty,  decreases  intelligibility. 

•  Increased  C-80  (clarity)  correlates  positively  with  intelligibility.  It  should 
be  noted  that  increased  RT  does  not  necessarily  mean  a  decrease  of  clarity,  as  seen  in 
table  3.2.  The  acoustical  reflectors  used  in  the  Queen  Elizabeth  Theatre  are  designed 
to  add  to  early  reflections  and  thereby  increase  clarity,  counteracting  the  effect  of  a 
large  reverberation  time  on  textual  intelligibility. 

•  Increased  strength  correlates  with  an  increase  of  intelligibility. 
Interestingly,  though  strength  is  low  in  the  Queen  Elizabeth  Theatre  (1  dB,  compared 
with  7  dB  in  the  experimental  theatre),  strength  is  0  dB,  by  definition  in  the  anechoic 
chamber  where  the  intelligibility  is  still  very  high.  Therefore,  strength  is  not  the 
principle  factor  in  determining  intelligibility. 

What  may  be  unexpected  is  the  indication  here  that  in  any  live  performance  hall, 
text  intelligibility  is  less  than  50%.  Even  when  looking  at  consonants  alone,  intelligibility 


19 


is  less  than  50%,  and  for  vowels  alone,  it  is  not  much  better  (between  55%  and  68%). 
Although  this  study  was  limited  to  a  few  phrases  performed  in  very  specific  conditions, 
the  results  would  indicate  that,  at  best,  listeners  in  a  performance  hall  would  understand 
half  of  the  text.  What  is  particularly  noteworthy  about  this  result  is  that  these  sound 
recordings  are  made  in  ideal  conditions:  a  professional  level  choir  singing  in  an  anechoic 
chamber,  where  there  is  no  reverb  to  confuse  the  choristers.19  This  means  that  the  timing 
of  their  words  is  ideal  relative  to  typical  recording  conditions  and  their  text  is  very 
accurate,  relative  to  amateur  choirs.  In  a  real-life  scenario  with  an  amateur  choir,  text 
intelligibility  is  likely  to  be  even  lower. 

The  balance  between  RT  and  intelligibility  is  an  important  one,  especially  for 
choral  music  in  which  musical  beauty  and  textual  clarity  are  both  important  factors. 
Historically,  intelligibility  has  been  favoured  over  reverberance,  with  RT  values  in  the 
range  of  1.2  to  1 .4  for  nineteenth  century  opera  houses.  Today,  the  rich  reverberant  sound 
has  gained  popularity  at  the  expense  of  intelligibility,  with  modem  day  opera  houses 
designed  for  RT  between  1.4  and  1.7  s.20  This  underlines  the  present  need  for  improved 
clarity  of  text,  given  the  aesthetic  preference  for  reverberance  among  musicians  and  the 
general  public. 


19  Each  singer  in  the  St.  Olaf  Cantorei  Choir  listened  to  the  other  singers  through  a  set  of  headphones 
during  the  performance  in  order  to  improve  auditory  feedback  and  synchronization  among  the  singers 
within  the  echoless  chamber. 

20  John  O'Keefe,  Acoustician,  Aercoustics  Engineering  Ltd.,  Toronto,  ON. 


20 


Chapter  4  -  Analysis:  Diction  Problems  and  Solutions 


The  study  of  diction  involves  classification  of  consonant  sounds  according  to  how 
they  are  formed.  Below  is  a  list  of  the  commonly-used  terms,  the  International  Phonetic 
Alphabet  (IP A)  symbol  (in  square  brackets)  of  the  consonants  comprising  each  consonant 
group,  and  sample  English  words  in  which  the  consonants  are  found. 

•  Plosive  or  Stop  Consonant  [p  t  k  b  d  g]  -  explosive  sounds  as  in  the  words  “pie,  tie, 
kite”  (unvoiced)  and  “bie,  die,  guy”  (voiced) 

•  Fricative  [f  0  s  J  v  5  z  3  ]  -  friction  sounds  as  in  “fin,  thin,  sin,  shin”  (unvoiced) 
and 

“vain,  than,  zing,  Asian”  (voiced) 

•  Affricate  [tf  d3  ]  -  combination  of  plosive  and  fricative  consonants  as  in  the 
words 

“catch”  and  “edge.” 

•  Nasal  [m  nq]  -  sound  created  by  air  stream  passing  through  the  nasal  passages  as 
in  the  words  “mom”  “none”  and  “sing” 

•  Approximant  [w  r  j  1]  -  vowel-like  consonants  as  in  “we”  “re”  “ye”  and  “lea” 

•  Glottal  [h]  -  friction  sounds  made  in  the  throat  such  as  the  word  “high” 


This  study  will  focus  on  plosive  and  fricative  consonants,  as  well  as  nasals  and 
approximants.  The  other  types,  affricatives  and  glottals,  will  not  be  addressed  here  since 
they  occur  less  frequently  in  the  English  language,  and  since  the  exploration  of  these 
consonants  would  require  research  that  goes  beyond  the  scope  of  this  paper. 

Intelligibility  data  from  the  listener  feedback  for  fricatives,  plosives, 
approximants  and  nasals  shows  how  each  of  the  consonant  types  reacts  to  various 
acoustical  characteristics: 


21 


. 


Table  4.1  -  Intelligibility  of  Consonant  Sounds  in  Various  Performance  Venues"1 


Performance  Space 

Unvoiced 

Plosives 

[ptk] 

Voiced 

Plosives 

[bdg] 

Fricatives 

[fesj  ] 

[v  a  z  3  ] 

Approx¬ 

imants 

[wrj  1] 

Nasals 

[ni  n  g] 

Anechoic  chamber 

74% 

71% 

85% 

94% 

62% 

Medicine  Hat’s 
Esplanade  Theatre 

45% 

42% 

78% 

75% 

47% 

Vancouver’s  Queen 
Elizabeth  Theatre 

33% 

25% 

68% 

65% 

42% 

SFU’s  Experimental 
Theatre  (curtains) 

60% 

38% 

79% 

69% 

42% 

SFU  Experimental 
Theatre  (no  curtains) 

57% 

25% 

75% 

78% 

37% 

According  to  this  data,  the  most  problematic  consonant  types  are  plosives  (both 
voiced  and  unvoiced),  and,  perhaps  more  surprisingly,  nasals.  In  the  case  of  plosives,  the 
listener  feedback  suggests  that  these  types  of  consonant  sounds  effectively  disappear  in 
reverberant  spaces.  For  nasal  consonants  in  reverberant  spaces,  the  listener  recognizes  the 
consonant  as  a  nasal,  but  in  most  cases  is  unable  to  correctly  identify  which  nasal  is  being 
sung. 

Approximants  and  fricatives  are  not  as  difficult  to  distinguish;  they  actually 
become  exaggerated  by  a  live  acoustic  space  and  so  must  be  carefully  managed  in  order 
to  prevent  them  from  dominating  the  overall  sound.  (The  acoustical  reasons  for  this  will 
be  explained  further  in  the  sections  that  follow  on  approximants  and  fricatives.) 
Exaggerated  ‘s’  sounds,  for  example,  tend  to  overpower  the  consonant  and  vowel  sounds 
around  them,  causing  an  obscured  overall  effect,  thereby  reducing  the  choral  text  to  a 
string  of  inarticulate  hissing. 

21  Refer  to  appendix  C  for  RT  correlation  graphs. 


22 


Spectrograms 

In  order  to  gain  a  clearer  understanding  of  how  consonant  sounds  are  affected  by 
concert  hall  acoustics,  it  is  helpful  to  use  a  phonetics  tool  known  as  a  spectrogram. 
Originally  devised  for  the  purpose  of  providing  the  deaf  with  a  visual  representation  of 
speech,  spectrograms  are  “3-dimensional  graphs  of  time,  frequency,  and  amplitude.”22 
The  vertical  axis  shows  frequency,  the  horizontal  axis  represents  time,  while  the  degree 
of  darkening  corresponds  to  the  amplitude  or  power  of  the  sound  wave.  A  dark  region  on 
the  graph  thereby  represents  a  relatively  high  amplitude,  whereas  a  white  region  indicates 
that  the  amplitude  at  this  frequency  and  time  is  too  low  to  register. 

In  their  book  The  Handbook  of  Phonetic  Sciences ,  Hardcastle,  Laver  and  Gibbon 
discuss  how  the  calculations  that  determine  frequency  spectrum  at  various  instances  of 
time  mimic  the  way  the  cochlea  of  the  middle  ear  converts  time-varying  pressure  of 
sound  waves  into  frequency  data  using  a  series  of  finely  tuned  resonators.23  Visually, 
spectrograms  can  be  interpreted  in  much  the  same  way  that  the  brain  interprets  cochlear 
data:  by  observing  the  movement  of  ‘formants’.  Formants  are  energy  concentrations  that 
occur  in  specific  frequency  ranges,  represented  by  darker  horizontal  or  diagonal  lines  on 
the  spectrogram.  Formants  correspond  to  the  natural  resonances  of  the  vocal  tract 
produced  by  specific  and  varying  anatomical  shapes  brought  about  by  the  movement  of 
articulators.  In  other  words,  the  movement  of  formant  lines  on  a  spectrogram  is 
analogous  to  the  movement  of  the  articulators  in  producing  speech.  In  particular,  the 


“  Nigel  Hewlett  and  Janet  Beck,  An  Introduction  to  the  Science  of  Phonetics  (Mahwak,  NJ:  Lawrence 
Erlbaum  Associates,  2006),  165-167. 

2"  William  J.  Hardcastle,  John  Laver  and  Fiona  Gibbon,  eds..  The  Handbook  of  Phonetic 
Sciences  ,  2d  ed.  (Maiden,  MA:  Wiley-Blackwell,  2010),  770-771. 


23 


degree  of  opening  of  the  mouth  affects  the  frequency  of  the  first  formant,  while  the 
tongue  position  is  largely  responsible  for  the  frequency  of  the  second  formant. 

Physically,  this  formant  movement  relies  on  a  phonetic  speech  concept  called 
“secondary  articulation”.  Secondary  articulation  refers  to  detailed  placement  of  the 
articulators  when  the  vocal  apparatus  makes  a  sound.  The  primary  articulators  are 
responsible  for  creating  a  particular  consonant  sound  such  as  [t],  which  involves  placing 
the  tongue  on  the  hard  palate  directly  behind  the  teeth.  This  is  known  as  “palatization”. 
The  secondary  articulation  in  this  case  refers  to  the  subtle  lip-rounding,  or  “labialization”, 
that  occurs  at  the  same  time  as  the  tongue  placement,  and  the  narrowing  of  the  pharynx, 
called  “pharyngealisation”,  achieved  by  pulling  the  tongue  backward  into  the  pharynx. 
The  sounds  produced  with  these  secondary  articulations  are  noticeably  different  than 
sounds  produced  without  them.  They  affect  the  overall  shape  of  the  vocal  tract,  which  in 
turn  affects  the  perceived  sound  and  the  appearance  of  the  spectrogram,  more  specifically 
the  formant  movements  (or  patterns)  that  are  displayed.24  Hardcastle  et  al  give  the 
example  of  the  words  “bit”  and  ‘built”,  both  of  which  have  the  sound  [t].  However,  the 
sound  produced  in  ‘built’  is  noticeably  different  because  the  tongue  is  preparing  for 
‘velarization’  of  the  lateral  consonant  [1],  so  the  shape  of  the  mouth  is  different  and  the 
second  formant  is  noticeably  changed.2'  It  is  these  types  of  articulatory  subtleties  that  can 
impede  precise  textual  perception. 

Figure  4. 1  shows  the  word  “sigh”  in  spectrogram  form.  The  darker  region 
between  4000  and  10,000  Hz  represents  a  concentration  of  energy  in  the  upper 


24  Michael  Ashby  and  John  Maidment,  Introducing  Phonetic  Science  (Cambridge: 

Cambridge  University  Press,  2005),  119-120. 

25  Ibid.,  121. 


24 


frequency  spectrum  that  occurs  for  the  sibilant  [s].  The 
first  three  formants  are  visible  in  the  lower  frequency 
range,  shown  as  darker  bands  of  energy  starting  at 
approximately  500  Hz,  1000  Hz  and  2500  Hz 
respectively.  The  first  and  second  formants  move  up 
during  the  first  half  of  the  word,  as  the  mouth  opens  for 
the  [a]  vowel.  The  first  and  second  formants  move  apart 
for  the  second  half  of  the  word,  as  the  mouth  closes  and 
the  tongue  moves  up  and  back  for  the  diphthong  [ai  ]. 

In  continuous  singing  or  speech,  spectrograms 
show  no  gaps  or  visible  separations  in  the  way  that  we 
write  out  separate  vowels  and  consonants.  Even  though 
we  may  distinguish  vowels  and  consonants  separately  in 
our  minds,  the  vocal  tract  is  in  continuous  motion.  The 
positions  of  the  articulators  in  the  various  gestures  that 
make  up  speech  are  influenced  by  the  sounds  (and 
positions  of  articulators)  that  precede  and  follow  any  given  sound.  In  other  words,  “the 
properties  of  one  sound”  as  influenced  by  the  position  of  the  articulators,  “tend  to  spread 
onto  neighbouring  sounds.”26  This  is  known  as  “co-articulation”.  The  authors  give  the 
example  of  the  word  “daydream”.  Even  though  the  [d]  is  articulated  twice,  the  point  of 


*  14.2  '  UA  "  %4M 


Figure  4.1  -  Spectrogram 
of  “sigh”  [sai  ]  spoken  by 
a  single  voice 


26  Ashby  and  Maidment,  121. 


25 


alveolar  contact  in  the  second  [d]  is  noticeably  further  back  on  the  alveolar  ridge,  which 
is  required  for  the  [r]  that  follows.27 

Plosives 

Many  reputable  authors  and  conductors  suggest  that  ‘under-definition’  is  to  blame 
when  plosive  consonants  are  not  heard.  Some  even  accuse  the  choristers  of  ‘carelessness 
and  indifference’.28  A  common  instruction  given  to  help  overcome  the  difficulty  in 
hearing  plosives  is  to  exaggerate  all  consonants,29  or  to  make  them  stronger  relative  to 
vowel  sounds.30  However,  the  relative  clarity  of  the  anechoic  choral  recordings  would 
seem  to  suggest  that  ‘carelessness  and  indifference’  are  not  actually  the  problem  here;  nor 
is  under-definition  to  blame.  Looking  at  the  negative  correlation  between  RT  and 
intelligibility  of  plosives,  it  seems  clear  that  room  acoustics  are  principally  to  blame. 

Even  if  under-definition  were  the  root  problem,  there  are  two  main  difficulties  with  the 
technique  of  exaggerating  plosive  sounds.  Firstly,  as  Wilson  points  out  in  his  book. 
Artistic  Choral  Singing ,  over-definition  of  the  consonants  may  become  an  obsession  and 
a  “detriment  to  the  whole  musical  effect”.31  Secondly,  by  placing  the  focus  on  consonant 
sounds,  the  vowel  sounds  may  become  compromised  and  weakened,  detracting  from  the 
quality  of  the  sound  as  a  whole,  as  well  as  affecting  the  intelligibility  of  text.  Because 
vowel  sounds  inherently  possess  so  much  of  the  sustained  acoustic  energy  of  the  singing 


27  Ibid.,  124. 

28  Harry  Robert  Wilson,  Artistic  Choral  Singing:  Practical  Problems  in  Organization,  Technique  and 
Interpretation  (NY :  Schrimer  Inc.,  1959),  123. 

29  Ibid.,  145;  Don  L.  Collins,  Teaching  Choral  Music,  2nd  ed.  (  Upper  Saddle  River,  NJ:  Prentice  Hall, 
1999),  298. 

M  Gordon  H.  Lamb,  Choral  Techniques,  3rd  ed.  (Dubuque  IA:  Wm.  C.  Brown,  cl 988),  70;  Robert  L. 
Garretson,  Conducting  Choral  Music,  6th  Ed.  (Englewood  Cliffs,  NJ:  Prentice  Hall,  cl  988),  96-97 . 

31  Wilson,  124. 


26 


voice,  even  if  this  consonant-boosting  method  could  successfully  improve  plosives,  it 
may  actually  decrease  the  overall  intelligibility. 

When  it  comes  to  the  voiced  plosives,  it  is  common  for  choir  directors  to  instruct 
singers  to  change  [b],  [d]  and  [g]  sounds  to  [p],  [t]  and  [k]  sounds  respectively,  in  order  to 
add  power  to  the  consonant  sounds.  Their  hope  is  that  a  weakened  [p]  will  turn  into  a  [b] 
sound,  for  example.  The  problem  with  this  type  of  modification  is  that,  according  to  the 
results  of  this  study,  unvoiced  consonants  [p  t  k]  do  not  actually  become  their  weaker- 
voiced  relative  [b  d  g]  when  sung  in  very  large  performance  halls.  Instead,  the  consonant 
sounds  seem  to  blur  and  disappear  altogether.  This  means  that  changing  [b],  [d]  and  [g]  to 
[p],  [t]  and  [k]  in  order  to  improve  clarity  may  actually  cause  the  audience  to  hear  [p],  [t] 
and  [k],  and  thus  further  obscure  the  text.  Upon  close  inspection,  the  listener  data  shows 
that  [p],  [t]  and  [k]  were  only  rarely  mistaken  for  [b],  [d]  or  [g],  and  [b],  [d]  and  [g]  were 
mistaken  for  the  more  powerful  unvoiced  [p],  [t]  and  [k]  an  equal  number  of  times,  if  not 
more/2  So,  if  increased  intelligibility  is  the  goal,  it  is  unwise  to  change  voiced  plosives  to 
unvoiced  plosives  in  order  to  add  clarity  since  the  net  result  may  very  well  be  reduced 
clarity. 

There  is  an  explanation  for  why  plosive  consonants,  either  voiced  or  unvoiced, 
often  disappear  in  large  performance  halls.  It  has  to  do  with  the  phonetic  science  behind 
these  sounds  and  the  way  the  brain  interprets  them.  Plosives,  or  ‘stop  consonants’, 
involve  a  momentary  complete  stoppage  of  airflow  in  the  vocal  tract.  The  voiced  stop 
consonants  [b  d  g]  are  formed  in  the  mouth  in  almost  the  same  way  as  their  unvoiced 
equivalents  [p  t  kj.  In  both  types  of  plosives,  a  small  explosion  of  air  occurs  after  the 
stoppage  of  air-flow,  making  a  short  high  frequency  noise  burst  just  before  the  transition 
32  See  appendix  C  for  listener  feedback  data 

m 


27 


■ 


to  the  vowel  sound  that  follows.  The  interesting  thing  about  stop  consonants  is  that  the 
brain  does  not  interpret  the  stop  consonant  as  an  isolated  event.  In  other  words,  there  is 
no  way  to  pronounce  or  to  understand  a  [b],  for  example,  without  having  an  adjacent 
vowel.  Instead  of  hearing  a  [b],  the  brain  hears  a  syllable  such  as  [ba]  by  detecting  the 
transitions  that  occur  from  the  [b]  to  the  [a].  In  fact,  when  hearing  the  word  “ban”,  the 
brain  detects  a  brief  stop,  and  then  hears  a  movement  of  frequencies  as  the  mouth  shape 
changes  from  a  closed  lip  formation  for  the  [b]  to  the  [ae]  vowel  shape  that  follows.  As 
phonetics  expert  and  author,  Ladefoged  points  out,  the  brain  is  able  to  hear  and  correctly 
interpret  plosive  consonants  even  without  hearing  the  stoppage  of  sound.33  In  other 
words,  the  human  bram  can  understand  a  consonant  just  by  hearing  the  transition  of  the 
mouth  shape  and  the  resulting  frequencies  that  are  produced.34  The  fact  that  plosive 
intelligibility  relies  on  this  fast  and  subtle  transition  of  formants  makes  it  especially 
susceptible  to  acoustical  blurring  in  large  performance  halls.  This  will  become  clear  with 
the  spectrogram  analysis  that  follows. 

The  following  spectrogram  shows  the  formant  patterns  characteristic  of  plosive 
consonants  and  subsequent  vowels/diphthongs.  The  frequency  components  of  each  word 
are  displayed  on  the  vertical  axis,  and  time  on  the  horizontal  axis.  The  spectrogram  shows 
how  the  formants  move  up  or  down  as  the  words  “buy”,  “die”  and  “guy”  are  being 
pronounced.  The  movements  of  the  first  and  second  formants  are  of  particular  interest  to 
this  study.  These  formants  carry  the  vowel  information,  and  correspond  quite  closely  to 
the  action  of  the  opening  of  the  mouth  and  the  movement  of  the  tongue,  though  other 


33  Peter  Ladefoged,  Vowels  and  Consonants:  An  Introduction  to  the  Sounds  of  Languages  (Maiden,  MA: 
Blackwell  Pub.,  c2005),  50. 

34  Ashby  and  Maidment,  22. 


28 


' 


secondary  articulation  also  comes  into  play.  Generally  speaking,  the  degree  of  openness 
of  the  mouth  affects  the  first  formant  and  the  position  of  the  tongue  affects  the  second 


Figure  4.2  -  Spectrogram  of  “buy,  die,  guy”  [bai  dai  gai  ]  spoken  by  a  single  voice35 
The  first  formant  moves  up  at  the  beginning  of  all  three  words  since  the  vocal 
tract  resonances  are  low  when  the  lips  are  nearly  closed,  and  rise  as  the  mouth  opens.'16 
The  red  dots  indicate  the  location  of  the  first  three  formants  at  the  beginning  of  each 
word.  For  the  word  “buy”,  the  second  formant  starts  low,  close  to  the  first  formant 
because  the  tongue  remains  low  in  the  mouth  for  the  [b]  of  “buy”  as  compared  with  the 
[d]  of  “die”  or  the  [g]  of  “guy”.  The  distinction  between  “die”  and  “guy”  is  more  subtle, 
although  the  latter  has  a  slightly  higher  tongue  position  and  therefore  higher  second 
formant  and  a  lower  third  formant  due  to  differences  in  secondary  articulators  in 
transitioning  from  [g]  to  [a].  For  all  three  of  these  words,  the  formants  follow  essentially 
the  same  paths  toward  the  end  of  the  word  since  they  all  share  the  same  diphthong  [ai]. 


35  Ladefoged,  52. 

36  Ibid.,  50. 


29 


As  the  mouth  closes  and  the  tongue  rises  for  this  diphthong,  the  first  and  second  formants 
move  apart  from  one  another. 

The  following  spectrograms  compare  plosive  consonants  in  the  anechoic  chamber 
with  those  produced  in  the  Queen  Elizabeth  Theatre.  In  the  reverberant  hall,  the  transitory 
formants  are  obscured  not  only  by  the  80  voices  but  also  by  the  reflected  sound. 


Figure  4.3  -  Spectrogram  of  St.  Olaf  Cantorei  choir  singing  “speaks  such”  [spik  sa  tj  ]; 
on  the  left,  in  anechoic  chamber;  on  the  right,  in  the  Queen  Elizabeth  Theatre 

In  the  anechoic  spectrogram,  on  the  left  side  of  figure  4.3,  the  second  formant  is 

quite  visible,  moving  clearly  up  and  down  in  the  word  “speaks”  starting  at  the  red  dot 

added  to  the  graph.  The  [s]  sounds  that  begin  and  end  the  word  are  very  easy  to  spot  as  a 


30 


vertical  band  of  energy  in  the  range  of  3.5  to  7  kHz  in  both  graphs.  The  red  lines  indicate 
the  region  where  the  stop  occurs  for  the  [p]  and  the  [k].  The  noise  bursts  for  these  stops 
occur  in  a  similar  range  to  where  the  [s]  is  found  on  the  spectrogram,  making  the  high 
frequency  bursts  very  difficult  to  hear. 

In  the  Queen  Elizabeth  Theatre  spectrogram,  on  the  right  side  of  figure  4.3,  the 
energy  level  in  general  is  higher  than  in  the  anechoic  spectrogram,  simply  because  of  the 
reflected  sound  energy.  This  results  in  a  much  darker  graph.  The  high  frequency  [s] 
energy  from  the  first  [s]  of  “speaks”  extends  all  the  way  to  the  second  [s]  in  the  Queen 
Elizabeth  Theatre  spectrogram,  obscuring  other  sounds  in  between,  including  the  plosive 
consonants.  The  second  formant  is  still  visible,  but  somewhat  less  distinct.  The  stop 
regions  (where  there  are  white  spaces  with  less  energy  on  the  spectrogram)  that  are  barely 
visible  In  the  anechoic  recording  have  become  quite  imperceptible  in  the  Queen  Elizabeth 
Theatre  graph.  The  visual  obscuring  of  these  consonants  is  corroborated  by  the  listener 
data  as  well.  All  but  one  listener  heard  the  [p]  and  the  [k]  of  “speaks”  in  the  anechoic 
recording,  but  with  the  reverb  of  the  Queen  Elizabeth  Theatre,  no  one  heard  either  the  [p] 
or  the  [k], 37 

Nasals 

The  next  important  category  of  consonant  sounds  that  is  greatly  affected  by 
performance  hall  acoustics  is  nasals  [m  n  g],  According  to  the  data  gathered  in  the 
listening  exercise,  this  category  is  very  problematic,  second  only  to  plosive  consonants. 
Clarity  ranged  from  a  high  of  67%  in  the  anechoic  chamber  to  as  low  as  37%  in  SFU’s 
Experimental  Theatre  with  no  curtains. 

37  Refer  to  appendix  C  for  listener  feedback  data. 


31 


Many  reputable  conductors  and  authors  recommend  improving  the  clarity  of  nasal 
consonants  through  elongation  of  the  sound.  Elongation  is  unlikely  to  improve 
intelligibility,  however,  since  nasals  do  not  disappear  the  way  that  plosives  do  in  a  ‘live5 
acoustic,  but  rather  they  become  indistinguishable,  hi  other  words,  listeners  have  trouble 
distinguishing  between  [m],  [n]  and  [13]  sounds.  The  data  shows  that  as  the  reverberance 
increases,  so  does  the  confusion  between  these  nasal  sounds.  So,  elongation  may  help  the 
listener  to  identify  a  nasal,  but  it  is  not  likely  to  help  identify  which  nasal  is  being  sung. 

Some  authors  and  conductors  suggest  that  some  consonants,  including  nasals  [m], 
[n]  and  [rjj  (as  well  as  [1]  and  diphthongs)  should  be  assigned  a  time  value  in  order  to 
improve  clarity.  '9  For  example,  if  the  word  “long”,  is  being  sustained  over  a  half  note,  a 
director  might  ask  for  the  [1]  to  be  sung  for  one  eight  note,  the  [a  ]  vowel  for  one  quarter 
note,  and  the  [rj]  for  one  eighth  note.  It  is  possible  that  this  timing  precision  may  help 
with  intelligibility  of  nasals  to  a  certain  extent.  A  spectrogram  analysis  will  clarify  why 
this  is  the  case. 


Lamb,  78;  Garretson,  94. 

39  Harold  A.  Decker,  Choral  Conducting:  Focus  on  Communication  (Prospect  Heights,  IL:  Waveland 
Press,  1995),  125. 


32 


Figure  4.4  -  Spectrogram  of  “ram,  ran,  rang,”  [ram  ran  rag]  spoken  by  a  single  voice 
In  this  spectrogram  for  words  “ram”,  “ran”,  and  “rang”,  the  second  and  third 
formants  come  together  for  the  [rj],  whereas  for  “ran”  and  “ram”,  the  third  formant  stays 
relatively  constant  while  the  second  falls  slightly.  (Formants  are  shown  beginning  at  the 
red  dots  on  each  graph.)  The  red  diagonal  lines  correspond  to  the  fast  upward  formant 
movement  associated  with  an  [r]  sound,  as  we  will  see  again  shortly  in  the  discussion  of 
approximants. 

After  the  transitory  portion  of  each  nasal  consonant,  there  is  a  distinct  decrease  of 
energy  when  the  tongue  or  lips  constrict  airflow  and  the  sound  escapes  only  from  the 
nose  (see  red  boxes  in  figure  4.4).  Although  the  transitory  portion  is  distinct  for  each  type 
of  nasal,  once  the  longer  final  portion  of  the  consonant  has  been  reached,  all  three  nasals 
look  essentially  the  same.  (In  this  example,  there  is  an  extra  concentration  of  energy  at 
about  2000  Hz  and  above  at  the  end  of  the  word  “rang”,  where  the  mouth  opens  up  to  a 
final  vowel  sound.)  Therefore,  it  is  only  in  the  transitory  onset  of  the  nasal  that  the 
differentiation  occurs.  This  explains  why  the  nasal  consonants  do  not  completely 


33 


disappear  when  muddled  by  a  ‘live’  acoustical  space.  The  onset  of  a  particular  nasal 
consonant,  which  helps  to  distinguish  it  from  the  other  nasal  consonants,  involves  a  quick 
and  subtle  movement  of  formants,  similar  to  plosive  consonants,  which  are  easily  blurred 
by  reverberant  sound.  The  final  closed  portion  of  the  nasal  is  longer  with  little  or  no 
formant  movement,  making  it  easier  to  discern,  even  with  increased  reflected  sound. 

Since  the  transitory  portion  of  the  nasal  is  most  easily  corrupted  by  reverberant  sound, 
and  this  transitory  portion  is  key  to  intelligibility  of  nasals,  it  is  clear  that  reverberant 
sound  is  detrimental  to  intelligibility  of  nasals. 

A  spectrogram  analysis  reveals  how  acoustical  conditions  can  cause  a  lack  of 
clarity  in  nasal  consonants.  Figure  4.5  shows  the  word  “son”  in  the  anechoic  chamber  and 
figure  4.6  shows  the  same  portion  of  music  being  sung  in  SFU’s  Experimental  Theatre 
without  curtains. 


34 


s 

0 

n 

Figure  4.5  -  St.  Olaf  Cantorei  Choir  singing 
“son”  [sa  n]  in  anechoic  chamber 


Just  before  this  closed  nasal  portion  of 
the  [n]  consonant,  the  spectrogram  shows 
a  transitional  formant  movement  as  the 
tongue  moves  upward  to  close  airflow 
through  the  mouth.  This  can  be  seen  by 
an  upward  movement  of  formants  at 
1000  Hz  and  1500  Hz.  There  is  some 
overlap  between  this  upward  formant 
movement  and  the  lower  energy 
concentration  of  the  closed  nasal.  This 
overlap  is  due  to  a  lack  of 


S 

0 

n 

!0> . 1 

T0g  To© 

synchronization  as  the  singers  move  their  mouths  from  the  [a  ]  vowel  to  the  [n]  consonant.  In  the 
SFU  scenario  without  curtains, 
shown  in  figure  4.6,  the  closed 
[n]  sound  is  reverberated  in  the 
room  to  such  an  extent  that  it  is 
carried  into  the  following  word, 
as  compared  to  the  spectrogram 
of  the  anechoic  chamber 
recording.  The  lively 
reverberance  of  this  lower 


Figure  4.6  -  St.  Olaf  Cantorei  Choir  singing  “son”  [sa  n] 
in  SFU’s  Experimental  Theatre  without  curtains 


frequency  range  in  the  room  makes  the  sustained  portion  of  the  nasal  very  easy  to  hear 
(beginning  just  after  108  s),  but  the  transitory  pail  of  the  nasal,  identifying  it  as  an  [n],  is  more 


35 


difficult  to  discern  in  this  spectrogram.  Although  slight  upward  formant  motion  is  visible  at  1000 
and  1500  Hz,  it  is  mostly  overrun  by  the  reverberance  of  the  vowel  formant  that  precedes  the  [n], 
before  108  s  on  the  spectrogram. 

Fricatives 

Fricatives  are  produced  by  friction  of  airflow  against  an  articulator,  and  may  or  may  not 
involve  vibration  of  the  vocal  folds  (voiced  and  unvoiced  fricatives).  For  the  puiposes  of  this 
analysis,  we  will  focus  on  the  unvoiced  sub-category  of  fricatives  including  [f  0  s  J  ].  Unlike 
plosive  consonants,  fricatives  can  stand  alone,  and  do  not  rely  on  mouth  movement  to  adjacent 
vowels  in  order  to  be  understood.  On  a  spectrogram,  a  fricative  can  be  identified  by  the  location 
of  a  concentrated  of  energy  in  the  upper  frequency  range.  For  example,  an  [s]  is  identified  by  the 
presence  of  a  darker  region  in  the  range  of  8  to  10  kHz.  The  consonant  [0]  is  less  powerful,  and 
features  a  concentration  of  energy  in  the  region  of  8  kHz.  A  [f  ]  features  high-energy 
concentration  between  2  and  5  kHz,  with  energy  all  the  way  up  to  10  kHz,  similar  to  an  [s].  An 
[fj  sound  has  less  energy,  distributed  evenly  from  2  kHz  up  to  10  kHz. 


36 


Figure  4.7  -  Spectrogram  of  “fie,  thigh,  sigh,  shy” 

[fai  0ai  sai  f  ai  ]  spoken  bv  a  single  voice 

It  would  seem,  from  the  listener  data,  as  well  as  from  common  practice  in  the  conducting 
world  that  fricatives  are  relatively  easy  to  hear  in  choral  performance,  particularly  sibilants 
[s]  and  [f  ],  which  contain  more  energy  in  the  higher  end  of  the  frequency  spectrum.  Intelligibility  for 
fricatives  was  as  high  as  85%  in  the  anechoic  chamber  in  the  listener  feedback  of  this  study.  The 
lowest  intelligibility  for  fricatives  was  68%  in  the  Queen  Elizabeth  Theatre,  still  well  above  the 
overall  text  intelligibility  in  this  venue.  The  problem  with  sibilants  is  that  they  are  heard  too 
easily  and  too  prominently  in  a  reverberant  space,  often  obscuring  other  sounds  around  them. 
Unlike  plosives  and  nasals,  which  seem  to  become  obscured  or  disappear,  sibilants  tend  to  take 
over. 


37 


Some  fricatives  become  very  long  and  loud  because  of  a  lack  of  synchronization  within 


the  choir,  and  occasionally  because  of  elongation  by  amateur  singers.  However,  when  the 
reverberance  of  a  performance  venue  is  added  to  the  imprecision  of  many  voices  singing  a 
fricative,  it  results  in  the  consonant  sounds  being  even  longer  and  louder.  The  difficulty  with 
fricatives  is  that  they  occur  at  high  frequencies  on  the  spectrum,  and  with  a  lot  of  power, 
relatively  speaking.  Plosives,  on  the  other  hand,  are  found  nestled  in  the  middle  of  the  frequency 
spectrum,  among  vowel  and  pitch  frequencies,  so  they  tend  to  become  obscured  and 
overpowered  by  other  sound  events.  Fricatives  stand  out  far  more  than  most  other  consonants, 
and,  depending  on  the  acoustic  properties  of  the  room  and  the  size  of  the  choir,  may  completely 
dominate  a  phrase. 


Figure  4.8  -  Spectrogram  of  “this  who  speaks  such”  [5i  s  hu  spiks  sa  tf  ] 
sung  by  St.  Olaf  Cantorei  Choir  in  anechoic  chamber 


The  [s]  sounds  can  be  seen  as  a  dark  region  in  the  2  to  10  kHz  range  in  figure  4.8. 
Visually,  the  [s]  sounds  tend  to  dominate  the  overall  sound  in  figure  4.9,  in  the  reverberant 
Esplanade  Theatre.  One  [s]  sound  barely  fades  away  before  another  [s]  sound  occurs  in  the  more 


38 


reverberant  space,  somewhat  obscuring  the  vowel  sounds  which  occur  lower  in  the  frequency 


spectrum. 


Figure  4.9  -  Spectogram  of  “this  who  speaks  such”  ”  [6i  s  hu  spiks  sa  tf  ] 
sung  by  St.  Olaf  Cantorei  Choir  in  Medicine  Hat’s  Esplanade  Theatre 

In  order  to  help  reduce  the  powerful  masking  effect  of  sibilants,  some  choir  conductors 
suggest  that  certain  members  of  the  choir  omit  the  [s]  sounds  altogether.40  This  may  be  a  useful 
practice,  but  it  would  seem  that  at  the  very  least,  some  effort  needs  to  be  made  for  singers  to 
coordinate  their  [s]  sounds  and  other  sibilants  if  text  is  to  be  clearly  understood  in  reverberant 
spaces. 

Approximants 

The  final  category  of  consonants  to  examine  is  approximants  [w  r  j  1],  These  types  of 
consonant  sounds  are  relatively  easy  to  hear.  Intelligibility  was  greater  than  50%  for  each 

40  Don  L.  Collins,  Teaching  Choral  Music,  2nd  ed.  (Upper  Saddle  River,  NJ:  Prentice  Hall,  1999),  299. 


39 


performance  venue  in  the  listener  feedback.  Although  they  sometimes  disappear  in  a  wet 
acoustic,  and  are  occasionally  confused  with  one  another,  they  remain  one  of  the  easier 
consonant  groups  to  decipher. 

Approximants,  like  plosives,  are  identified  by  the  movement  of  articulators,  shown  by 
moving  formants  in  the  spectrogram  of  figure  4. 10. 


Figure  4.10  -  Spectrogram  of  “why,  lie,  rye”  [wai  lai  rai  ]  spoken  by  a  single  voice 
The  [w]  sound  features  an  upward  moving  first  and  second  formant  movement  below 
1000  Hz,  while  the  Jr]  sound  on  the  right  features  a  diagonal  upward  energy  cluster  seen  in  a  grey 
region  just  above  the  second  formant  between  1  and  2  kHz  just  after  8.5  s,  which  dissipates  once 
the  vowel  is  fully  formed  at  8.55  s.  This  same  pattern  is  visible  in  the  [r]  sounds  of  figure  4.4. 

The  [1]  is  different  from  the  other  two  shown  here;  it  has  a  low  energy  presence  below  500  Hz  at 
7.4  s  before  the  other  formants  appear.  This  corresponds  to  a  very  low  tongue  position  in  the 
body  of  the  tongue  for  the  [1]  formation  when  the  tip  of  the  tongue  is  behind  the  upper  teeth.  The 
second  half  of  each  word  has  similar  formant  movements  since  each  word  features  the  diphthong 
[ai  ],  characterized  by  formants  one  and  two  moving  apart  from  one  another,  and  formant  three 
moving  slightly  downward. 


40 


In  the  reverberant  acoustic  spaces,  approximants  became  somewhat  obscured.  Notice  the 
change  from  the  anechoic  chamber  to  the  Medicine  Hat  Esplanade  Theatre  for  the  word 
‘prophet’  in  figure  4.11. 


Figure  4.11  -  Spectrograms  of  “some  prophet”  [sa  m  pro  fi  t]  sung  by  St.  Olaf  Cantorei  Choir;  on 
the  left,  in  anechoic  chamber;  on  the  right,  in  Medicine  Hat’s  Esplanade  Theatre 


In  the  anechoic  recording,  the  [r]  is  identified  from  the  upward  diagonal  energy  cluster  in 
the  2  kHz  range  along  with  a  lot  of  low  frequency  energy  that  indicates  a  closed  mouth  and  high 
tongue  position.  The  consonant  [p]  is  also  clear  from  the  gap  in  vowel  formants  that  occurs  at 
about  84.4  s,  shown  by  the  red  vertical  line.  The  [f]  and  [t]  sounds  are  also  evident  from  higher 
frequency  energy  clusters  between  4  and  7  kHz  in  the  red  rectangles.  In  the  reverberant 


41 


recording,  the  [r]  formant  pattern  of  low  frequency  energy  and  steep  upward  diagonal  formant 
movement  is  still  somewhat  visible,  while  the  other  consonant  identifiers  are  more  obscured  by 
reverberant  sound  energy. 

The  high  intelligibility  of  approximants  raises  an  interesting  question.  If  both  plosives 
and  approximants  depend  on  formant  movement  to  be  understood,  why  is  intelligibility  of 
plosives  so  much  lower  than  approximants  in  reverberant  spaces?  The  answer  may  be  found  in 
the  difference  between  formant  behaviour  in  plosives  and  approximants.  For  plosives,  the 
transitory  period  is  very  quick,  and  sometimes  very  small  in  terms  of  frequency  change,  which 
leaves  it  more  susceptible  to  blurring  from  reflected  sound.  Tire  decay  time  for  reflected  sound  is 
much  longer  relative  to  the  transition  time  for  a  plosive  consonant  (as  well  as  for  the  onset  of 
nasals).  The  formant  transitions  for  approximants  usually  occur  over  a  longer  period  of  time 
relative  to  the  decay  time  for  reflected  sound,  so  are  more  resilient  to  reverberant  spaces  and  the 
blurring  that  they  produce.  Furthermore,  for  approximants,  a  great  number  of  formants  move  in 
more  significant  ways,  which  makes  the  changes  easier  to  detect,  even  with  the  blurring  of 
reverberant  sound. 

Since  the  performance  we  are  studying  is  of  a  professional  quality,  we  do  not  encounter 
the  common  problem  of  the  prominent  American  [r]  sounds.  However,  the  overpowering  nature 
of  [r]  sounds  was  still  evident  in  the  listener  feedback  with  the  addition  of  reverberant  energy.  In 
very  reverberant  spaces,  some  listeners  could  not  decipher  any  words,  and  detected  [r]  sounds 
throughout.  Some  common  solutions  to  minimize  the  prominence  of  [r]  sounds  include  omitting 
[r]  sounds  when  they  occur  before  a  consonant  or  a  pause41  (i.e.  “word”)  or  gently  flipping  [r] 


41  Lamb,  81;  Ray  Robinson  and  Allen  Winold,  The  Choral  Experience:  Literature,  Materials  and  Methods  (Prospect 
Heights,  IL:  Waveland  Press,  cl976),  120. 


42 


sounds  before  a  vowel  sound42  (i.e.  “or  a”).  For  [r]  sounds,  the  clarity  of  text  is  not  so  much  at 
stake  as  the  beauty  of  the  overall  choral  sound.  In  the  end,  directors  will  choose  how  to  deal  with 
[r]  sounds  depending  on  the  style  of  music,  the  words,  and  the  singers  involved. 


Summary 

The  diction  difficulties  noted  in  the  listening  exercise  and  data  collection  are  provided  in 
table  4.2,  along  with  corresponding  common  solutions  recommended  by  conducting 
professionals. 


Table  4.2  -  Overv  iew  of  Common  Diction  Problems  and  Solutions 


Plosives  [p  t  k]  [b  <!  g] 

Source  of  Problem 

Intelligibility  is  found  in  transition  in  frequency  from 
consonant  to  vowel.  These  transitions  are  obscured  by 
reverberance,  and  consonant  sound  is  lost. 

Common  Solution 

Exaggerate  the  consonant  -  make  it  louder. 

Evaluation  of  Common 

Solution 

Exaggeration  will  not  fix  the  problem  of  a  frizzy  transition.  It 
will  add  to  ‘noise’,  not  intelligibility. 

Nasals  [m  n  q] 

Source  of  Problem 

Intelligibility  is  surprisingly  low  for  nasals  in  live  acoustical 
spaces.  The  nasal  sound  is  not  ‘lost’,  as  with  plosives,  but  is 
‘confused’  so  that  [m],  [n]  and  [13]  cannot  be  distinguished. 

Common  Solution 

Carefully  time  their  onset,  and  make  them  last  longer. 

Evaluation  of  Common 

Solution 

This  may  help  to  a  small  extent,  but  more  specific  work  needs 
to  be  done  to  help  distinguish  these  sounds  for  the  listener. 

42 


Robinson  and  Winold,  120. 


Fricatives  [f  0  s  J  ]  [v  d  z  3  ] 

Source  of  Problem 

Fricatives  become  overpowering  because  they  exist  higher  on 
the  frequency  spectrum,  and  with  greater  power  than  other 
consonants.  Furthermore,  they  do  not  involve  a  ‘transition’,  but 
rather  a  steady-state  frequency,  which  becomes  exaggerated 
rather  than  obscured  in  a  live  acoustic. 

Common  Solution 

Teach  the  choir  to  make  them  shorter,  or  have  fewer  singers 
sing  them. 

Evaluation  of  Common 

Solution 

This  method  will  be  effective  at  reducing  the  overpowering 
fricative  in  a  large  hall. 

Approximants  [w  r  j  1] 

Source  of  Problem 

Approximants,  like  fricatives,  tend  to  become  overpowering, 
especially  [r]  sounds. 

Common  Solution 

For  [r]  sounds,  teach  choir  to  add  precision  to  the  beginning 
and  ending  of  words,  and  to  flip  [r]s  where  appropriate. 

Evaluation  of  Common 

Solution 

Adding  precision,  and  flipping  [r]s  where  appropriate  are  good 
solutions  to  the  problem  of  exaggerated  approximants  in  large 
venues. 

When  the  spectrograms  of  speech  are  meticulously  analysed,  it  is  clear  that  there  are 
many  problems  with  the  most  popular  methods  of  improving  choral  diction.  Perhaps  this  is 
because  it  is  assumed  that  consonants  are  self-contained  units  of  speech.  However,  in  reality, 
most  consonants  cannot  be  separated  from  the  vowels  that  surround  them.  The  biggest 
discrepancy  is  in  the  advice  to  exaggerate  consonants  for  improved  clarity.  For  stop  consonants, 
simply  making  the  consonants  louder  will  likely  exacerbate  the  problem.  In  fact,  if  the  high 
frequency  elements  of  plosive  consonants  are  removed  from  a  recording,  the  brain  can  usually 
still  interpret  what  is  being  said.43  This  is  because  it  is  actually  the  transition  of  the  articulators  as 


4jLadefoged,  100. 


44 


■ 


they  move  between  consonant  and  vowel  sounds  that  allows  the  listener  to  understand  words. 
Without  a  clear  transition,  there  is  no  clear  diction. 

The  ultimate  goal  in  good  choral  diction  is  kinesthetic  synchronization.  Choristers  need 
to  leam  to  shape  their  mouths  in  the  same  way  for  consonants  and  vowels,  and  they  must  all 
move  their  mouths  in  the  same  way  and  at  the  same  time  from  vowel  to  consonant  to  vowel. 

The  fact  that  multiple  voices  are  singing  together  in  a  choral  context  means  that  the 
phonetic  outcome  will  be  inherently  less  clear.  This  is  strictly  the  result  of  the  multiplicity  of 
voices  doing  slightly  different  things  at  slightly  different  times.  A  reverberant  room  acoustic  only 
exacerbates  the  problem.  Since  there  are  two  significant  and  overlapping  causes  of  text  blurring 
(multiplicity  of  voices  and  reverberant  acoustics),  choral  conductors  must  make  great  effort  to 
ensure  that  the  vocal  input  (the  pure  choral  sound)  that  is  put  into  a  room  is  as  clear  as  possible. 
Conductors  must  also  use  the  acoustical  qualities  of  the  room  to  their  best  advantage,  as  will  be 
discussed  in  chapter  five. 

Supporting  Research  on  Kinesthetic  Synchronization 

Within  the  choral  context,  the  goal  is  to  make  many  voices  seem  like  only  one.  Although 
this  objective  is  not  often  discussed  when  experts  give  their  advice  about  choral  diction,  there  is 
one  researcher  who  tackles  this  idea  in  his  doctoral  thesis.  Robert  Fisher  examines  kinesthetic 
awareness  and  synchronization  of  articulation  in  order  to  improve  intelligibility,  and  promotes 
this  approach  above  any  other  method. 

Fisher’s  research  involves  disciplined  and  specific  training  of  choirs  using  visualization 
techniques  and  specific  articulatory  training.44  These  methods  of  diction  training,  which  are 

44Robert  E.  Fisher,  “The  Design,  Development,  and  Evaluation  of  a  Systematic  Method  for  English  Diction  in 
Choral  Performance,”  Journal  of  Research  in  Music  Education  39  no.  4  (Winter  1991):  274. 


45 


similar  to  those  suggested  by  a  minority  of  authors  writing  about  diction,  are  upheld  by  scientific 
evaluation  of  speech  as  described  above.  If  singers  learn  to  move  their  mouths  in  the  same  way  at 
the  same  time,  they  will  become  more  like  a  single  audible  unit;  formant  movements  will 
become  clearer,  and  more  intelligible  diction  will  result.  As  an  added  benefit,  this  method  puts  an 
emphasis  on  correct  vowel  formation,  which  will  also  improve  the  overall  sound,  blend,  and 
resonance  of  the  choir. 

His  methods  involve  intentional  practice  in  forming  various  words  together  as  an 
ensemble.  This  type  of  kinesthetic  training  would  require  a  significant  time  commitment  for 
repetitive  and  slow-motion  training  and  muscle  memory  reprograming  in  order  to  undo  bad 
habits  of  pronunciation  that  inevitably  exist  within  the  choral  context. 

Particularly  for  plosive  consonants  and  nasal  onsets,  which  are  deciphered  by  quick  and 
subtle  formant  movement,  it  is  imperative  that  singers  start  and  end  with  the  same  mouth  shape 
in  any  given  transition.  Otherwise,  the  brain  will  not  register  a  clear  consonant  sound.  To  make 
matters  worse,  the  warm  reverberant  acoustic  of  most  performance  spaces  will  take  any  lack  of 
transitional  clarity  and  obscure  it  beyond  recognition.  If  increased  intelligibility  is  to  be  found, 
singers  must  be  brought  back  to  the  basics  of  how  to  use  the  teeth,  tongue,  lips  and  palate  to 
produce  sound.  This  would  be  best  achieved  over  a  long  period  of  time  with  the  same  choir  and 
conductor. 

Other  authors  support  the  use  of  synchronization  techniques  similar  to  those  suggested  by 
Fisher,  although  these  techniques  are  not  specifically  tested.  One  such  method  is  the 
synchronization  of  consonants  by  placing  them  on  or  before  the  ictus,  as  per  the  conductor’s 
direction.  Another  method  is  visualization,  which  focuses  on  the  physical  habits  of  speech  and 
speech  formation.  For  example,  a  conductor  might  teach  the  choir  to  sing  an  [1]  with  the  tongue 


46 


right  behind  the  upper  front  teeth,  thinking  of  it  as  transparent,  dike  glass,  so  that  air  can  pass  by 
on  either  side  of  it.”45  This  type  of  visualization  technique  could  be  applied  to  other  consonant 
and  vowel  formations.  However,  further  studies  would  be  necessary  to  determine  the 
effectiveness  of  such  techniques  in  creating  uniformity  among  ail  singers.  Other  helpful 
strategies  involve  asking  the  choir  to  sing  consonants  on  the  same  pitch  as  the  vowel  that 
follows  46  This  method  encourages  singers  to  have  a  similar  mouth  shape  as  they  transition  from 
consonant  to  vowel. 

Fisher’s  research  supports  the  intelligibility  findings  presented  in  this  essay.  He  found 
that  text  intelligibility  was  still  less  than  50%,  even  after  implementing  his  kinesthetic 
articulation  training.  Even  if  it  were  possible  to  achieve  complete  uniformity  amongst  all  singers, 
intelligibility  will  be  limited  by  the  physical  characteristics  of  performance  venues.  Therefore, 
conductors  must  consider  alternate  ways  of  enhancing  the  audience’s  experience.  They  must 
examine  architectural  acoustics,  and  look  for  an  optimal  balance  between  acoustical  beauty  and 
warmth,  and  textual  clarity.  They  should  also  look  at  psychological  considerations  such  as 
contextual ization  in  order  to  utilize  triggers  and  cues  to  help  the  audience  understand  more  of 
what  is  being  communicated. 


4:>  Wilhelm  Ehmann,  Choral  Directing ,  Translated  by  George  T.  Wiebe  (Minneapolis:  Augsburg  Pub.  House,  1968), 
58. 

46  Robinson  and  Wirnoid,  120. 


47 


Chapter  5  -  Other  Considerations  for  Text  Intelligibility 

It  is  evident  in  reviewing  the  listener  data  that  once  listeners  were  able  to  pick  up  a  few 
words,  the  subsequent  excerpts  were  easier  to  decipher.  On  the  other  hand,  if  they  heard 
gibberish  for  the  first  few  sound  samples,  they  were  more  likely  to  hear  only  gibberish  for  the 
rest  of  the  samples.  It  is  a  fairly  obvious  conclusion  that  if  the  listeners  were  able  to  determine 
some  context  for  the  song,  the  subsequent  words  were  easier  to  understand.  It  is  a  well-known 
principle  of  phonetics  that  when  a  listener  is  given  a  topic,  it  becomes  much  easier  to  interpret 
what  is  being  said.  This  applies  equally  to  choral  music  as  it  does  to  a  second-language  learner. 

Similarly,  there  are  psychological  triggers  for  audience  members  that  aid  in 
understanding  text.  One  important  trigger  or  aid  is  lip  reading.  If  the  hall  is  small  enough  (and 
the  audience  members’  vision  good  enough),  reading  lips  can  be  an  important  way  to  enhance 
text  comprehension.47 

Another  more  direct  type  of  aid  is  simply  to  print  the  text  in  the  program,  or  to  include 
subtitles,  even  for  English  text.  While  some  conductors  might  see  this  as  a  distraction  or  even  an 
excuse  for  lazy  diction,  the  reality  of  choral  music  is  that  achieving  a  high  level  of  intelligibility 
will  always  require  some  level  of  external  enhancement  or  support. 

Instead  of  providing  text  directly,  choirs  may  consider  supplying  detailed  introductions  to 
choral  pieces  in  order  to  help  the  audience  understand  key  words.  In  addition,  to  help  with  lip 
reading,  video  cameras  could  be  set  up  to  provide  an  enlarged  live  projection  of  singers. 
Whatever  the  preferred  method,  enhancements  could  easily  be  incorporated  to  make  choral 
concerts  more  meaningful  and  interactive  experiences. 

Along  with  audience  aids,  the  acoustical  properties  of  a  performance  hall  should  be  taken 
into  consideration  in  an  effort  to  promote  clear  diction.  It  is  important  to  briefly  explore 
47  Ladefoged,  109-110. 


48 


architectural  acoustics  in  order  to  determine  what  measures  can  be  taken  to  improve  diction  as 
effectively  as  possible. 

High  reverberation  times  usually  occur  in  larger  performance  halls  with  harder  surfaces. 
These  characteristics  are  very  good  for  rich,  warm  musical  sound,  but  they  distort  diction.  When 
choosing  a  performance  venue,  a  conductor  must  consider  the  drawbacks  of  a  very  large  hall. 
Opera  houses,  for  instance,  are  designed  with  a  lower  RT  than  concert  halls  (in  the  range  of  Is) 
because  they  are  designed  for  text.  A  conductor  may  choose  a  smaller  venue  for  a  choral 
performance  than  for  an  instrumental  one,  because  of  the  importance  of  text. 

Alternatively,  a  conductor  could  add  curtains  or  other  acoustical  treatments  to  a 
performance  space,  if  it  is  a  place  that  is  often  used  for  choral  performances.  In  the  listening 
examples,  it  was  shown  that  adding  curtains  to  the  SFU  Experimental  Theatre  had  a  marginal 
impact  on  intelligibility,  increasing  textual  clarity  by  about  5%.  When  building  new  venues  for 
live  noil-amplified  performance,  it  would  be  helpful  for  acousticians  and  choral  directors  to 
consult  with  each  other  in  order  to  determine  what  room  size  and  shape,  type  of  seating,  etc. 
would  be  most  appropriate  for  choral  performance.  Although  important,  text  intelligibility  would 
be  just  one  of  several  objectives  to  consider  in  designing  an  ideal  choral  performance  venue,  and 
would  have  to  be  balanced  with  the  need  for  a  warm,  reverberant  sound  in  order  to  enhance  the 
overall  musical  experience. 

Loudness  is  another  key  factor  in  understanding  text,  and  one  that  is  not  often  considered. 
With  all  other  factors  being  equal,  the  larger  the  performance  venue,  the  lower  the  acoustic 
strength.  Therefore,  although  larger  performance  venues  tend  to  produce  a  warm  and  rich 
reverberant  sound,  they  also  tend  have  a  lower  acoustic  strength  than  a  smaller  hall.  As  a  result, 


49 


■ 


the  audience  may  have  trouble  hearing  the  words  in  a  larger  hall  simply  because  the  words  are 
too  quiet. 

The  issue  of  strength  is  also  compounded  by  the  problem  of  outside  noise.  If  the 
performance  venue  is  not  isolated  from  external  noise  (from  an  adjacent  lobby  or  other  rooms, 
from  HVAC48  noise  or  even  from  trains  or  other  vibrating  external  sources),  then  the  text  will  be 
much  harder  to  decipher.  The  ideal  situation  is  to  have  a  performance  space  that  is  acoustically 
isolated,  or  at  the  very  least,  to  ensure  that  the  adjacent  spaces  remain  quiet  during  a 
performance. 

Early  reflections,  as  discussed  in  chapter  two,  are  another  important  key  to  textual  clarity. 
The  stronger  the  early  reflections,  the  easier  it  is  for  the  brain  to  understand  text.49  Early 
reflections  can  be  boosted  in  a  couple  of  ways.  One  way  is  to  use  a  choir  shell  to  increase  the 
number  of  first  order  reflections  that  bounce  straight  toward  the  listener.  Another  way  is  to 
ensure  that  the  room  is  being  used  in  the  best  possible  configuration.  If  there  is  a  stage  and  a 
proscenium  arch,  the  choir  should  sing  from  under  the  arch  or  just  in  front  of  it,  and  not  too  far 
back  on  the  stage.  This  will  ensure  that  more  sound  waves  are  reflected  directly  toward  the 
audience,  rather  than  reflecting  within  the  stage  before  reaching  the  audience,  thereby  increasing 
late  reflected  energy.  If  the  room  is  rectangular,  the  choir  should  perform  against  the  shorter 
wall,  not  the  longer  wall.  (Most  concert  hall  and  church  stages  are  set  up  this  way,  for  good 
reasons,  acoustically  speaking.)  This  may  be  counter-intuitive,  since  much  of  the  audience  will 
be  further  away  from  the  choir  in  this  configuration.  However,  because  of  increased  first-order 
reflections  from  side-walls,  the  longer,  narrower  hall  will  produce  increased  textual  clarity  than 
the  shorter,  wider  hall. 

48  HVAC  stands  for  Heating,  Ventilation  and  Air  Conditioning. 

49  Beranek,  24. 


50 


Chanter  6  -  Conclusions  and  Future  Work 


This  study  examined  intelligibility  of  choral  text  in  different  performance  venues. 
Listeners  provided  intelligibility  feedback  for  four  different  simulated  performance  venues,  each 
with  its  own  distinct  acoustical  qualities.  In  general,  text  intelligibility  proved  to  be  lower  in 
conceit  halls  with  a  higher  reverberation  time,  and  higher  in  concert  halls  with  a  higher  acoustic 
strength.  In  addition,  halls  with  higher  acoustical  clarity,  which  corresponds  with  a  high 
proportion  of  early  reflected  energy,  had  higher  intelligibility. 

After  a  closer  look  at  the  effects  of  room  acoustics  on  the  intelligibility  of  each  consonant 
type,  it  was  determined  that  plosive  and  nasal  consonants  were  the  most  problematic  in  more 
reverberant  spaces.  Through  an  exploration  of  some  commonly  used  diction  techniques, 
combined  with  spectrogram  analyses,  it  was  concluded  that  one  key  method  of  achieving  clearer 
diction  and  overcoming  acoustical  distortion  is  synchronization  of  speech.  This  method  was 
found  to  have  a  more  scientific  basis  when  compared  to  many  other  common  practices  for  the 
improvement  of  choral  diction,  most  notably,  the  technique  of  exaggeration  of  consonant  sounds. 

The  basis  for  this  conclusion  lies  in  the  premise  that  for  plosives  and  nasals  in  particular, 
the  understanding  of  text  is  achieved  by  teaching  careful,  synchronized  mouth  transitions 
throughout  the  ensemble,  as  opposed  to  merely  increasing  the  power  of  the  explosive  consonants 
or  using  other  methods  of  exaggeration.  The  time  and  effort  that  a  choir  devotes  to  diction  should 
therefore  focus  on  forming  the  same  shapes  with  the  mouth  and  vocal  articulators,  and 
synchronizing  each  change  of  shape  when  transitioning  between  consonant  and  vowel  sounds. 
Such  effort  would  help  to  mitigate  some  of  the  pitfalls  to  clarity  of  text  that  are  inherent  with 
large,  reverberant  acoustical  spaces. 


51 


This  same  solution  of  kinesthetic  synchronization  as  opposed  to  exaggeration  appears  to 
also  apply  to  fricatives.  Unlike  plosives  and  nasals,  fricatives  are  not  identified  by  transient 
behaviour  (movement  in  the  mouth),  but  rather  by  steady  state  high  frequency  energy.  As  such, 
these  types  of  consonants  tend  to  become  longer  and  louder  in  reverberant  spaces.  Therefore, 
clean  and  careful  production  of  these  consonants  through  synchronization  will  prevent  the 
sounds  from  blurring  and  overwhelming  the  entire  choral  sound. 

Like  plosives  and  nasals,  approximants  have  transitory  behaviour,  but  the  mouth 
movements  are  somewhat  slower  yet  more  extreme,  making  intelligibility  within  a  reverberant 
space  easier  to  achieve.  As  is  the  case  with  fricatives,  choirs  must  be  careful  that  production  of 
approximants  is  unified,  both  in  terms  of  timing  and  mouth  shape,  so  that  the  they  don't  become 
excessively  blurred,  obscuring  the  surrounding  text. 

As  choral  directors,  we  must  accept  that  no  choir  is  capable  of  conveying  all  or  even  most 
of  the  text  to  the  audience  in  most  live  performance  venues.  In  addition,  choirs  should  make  use 
of  large  reverberant  halls  for  performance,  in  order  to  achieve  a  richer,  fuller  sound,  while  being 
cognizant  of  the  fact  that  these  reverberant  spaces  work  directly  against  the  ability  of  the 
audience  to  understand  the  text. 

The  difficult  reality  about  textual  intelligibility  is  that  it  tends  to  remain  below  50%  for 
most  performance  venues.  Because  more  absorptive,  smaller  performance  halls  are  an 
inadvisable  option  in  ternis  of  achieving  a  rich,  reverberant  sound,  choirs  must  find  ways  to 
assist  the  audience  more  directly  by  providing  psychological  cues,  such  as  context,  lip  reading  or 
printed  text,  to  increase  understanding,  hi  addition,  with  sufficient  resources,  choirs  may  be  able 
to  alter  a  performance  space  by  using  choir  shields,  choosing  the  most  advantageous  room 


52 


configurations,  or  adding  curtains,  in  order  to  improve  textual  clarity  without  compromising  a 
warm,  reverberant  choral  sound;  or  a  choir  may  simply  choose  a  different  performance  venue. 

Since  choral  acoustics  is  a  new  and  relatively  unknown  interdisciplinary  field,  there  is 
much  more  that  could  be  explored,  such  as  the  effects  of  acoustical  reflectors,  choir  formation,  or 
rehearsal  space  acoustics  on  text  clarity.  It  would  also  be  worthwhile  to  explore  the  effectiveness 
of  various  teaching  strategies  pertaining  to  choral  diction,  including  vowel  formation,  consonant 
formation,  and  synchronization  training. 

In  rehearsal  situations  and  conducting  textbooks,  far  too  many  musicians  approach  the 
problem  of  textual  clarity  by  simply  proposing  what  they  were  taught,  or  by  relying  on  subjective 
feedback  to  improve  intelligibility.  With  the  findings  of  this  study,  and  with  further  related 
research,  future  choral  educators,  conductors  and  textbook  authors  will  have  a  more  informed, 
scientific  perspective  of  acoustics  and  phonetics,  and  will  therefore  be  able  to  more 
knowledgeably  impart  to  choirs,  conductors  and  audiences  the  principles  of  effective  diction. 


53 


Bibliography 

Anechoic  Choral  Recordings.  St.  Olaf  Cantorei  Choir,  conducted  by  Dr.  John  Ferguson. 

Recorded  October  21,  2003.  Owatonna,  MN:  Wenger  Corporation,  2004.  1  compact  disc, 
1  data  dvd. 

Ashby,  Michael  and  John  Maidment.  Introducing  Phonetic  Science.  Cambridge:  Cambridge 
University  Press,  2005. 

Barron,  Michael.  Auditorium  Acoustics  and  Architectural  Design.  NY:  Taylor  &  Francis,  2009. 

Batey,  Angela  L.  “Preparing  a  High  School  Choir  for  Adjudication.”  Choral  Journal  48,  no.  6 
(2007):  73-75. 

Beranek,  Leo.  Concert  and  Opera  Halls:  How  They  Sound.  Woodbury,  NY:  American  Institute 
of  Physics,  cl  996. 

— .  Concert  Halls  and  Opera  Houses:  Music,  Acoustics  and  Architecture.  2nd  Ed.  NY : 
Springer,  2004. 

Campbell,  Murray.  "Reverberation  time."  In  Grove  Music  Online.  Oxford  University 
Press,  2007-  Accessed  June  19,  2012. 

http://www.oxfordmusiconlme.com.login.ezproxy.library.uaiberta.ca/subscribei7article/gr 

ove/music/23282. 

Collins,  Don  L.  Teaching  Choral  Music.  2nd  Ed.  Upper  Saddle  River,  NJ:  Prentice-Hall  Inc., 
1999. 

Decker,  Harold  A.  Choral  Conducting:  Focus  on  Communication.  Prospect  Heights,  IL: 
Waveland  Press,  1995. 

Ehmann,  Wilhelm.  Choral  Directing.  Translated  by  George  T.  Wiebe.  Minneapolis:  Augsburg 
Pub.  House,  1968. 

Fisher,  Robert  E.  “The  Design,  Development,  and  Evaluation  of  a  Systematic  Method  for 

English  Diction  in  Choral  Performance.”  Journal  of  Research  in  Music  Education  39  no. 

4  (Winter  1991):  270-281. 

Foulkes,  Timothy  J.  Halls  for  Music  Performance:  Another  Two  Decades  of  Experience  1982  - 
2002.  Melville  NY:  Acoustical  Society  of  America,  2003. 

Garretson,  Robert  L.  Conducting  Choral  Music.  6th  Ed.  Englewood  Cliffs,  NJ:  Prentice-Hall 
Inc.,  c!988. 

Hardcastle,  William  J.,  John  Laver  and  Fiona  Gibbon,  Eds.  The  Handbook  of  Phonetic  Sciences. 
2nd  Ed.  Maiden,  MA:  Wiley-Blackwell,  2010. 


54 


.  ' 


Hewlett,  Nigel  and  Janet  Beck.  An  Introduction  to  the  Science  of  Phonetics.  Mahwah,  NJ: 
Lawrence  Erlbaum  Associates,  2006. 

Jones,  Archie  N.  Techniques  in  Choral  Conducting.  NY:  Carl  Fischer  Inc.,  cl 948. 

Ladefoged,  Peter.  Vowels  and  Consonants:  An  Introduction  to  the  Sounds  of  Languages.  Maiden, 
MA:  Blackwell  Pub.,  c2005. 

.  Elements  of  Acoustic  Phonetics.  Chicago:  University  of  Chicago  Press, 

1996. 

Lamb,  Gordon  H.  Choral  Techniques .  3rd  Ed.  Dubuque  IA:  Win.  C.  Brown,  cl 988. 

Robinson,  Ray  and  Allen  Winold.  The  Choral  Experience:  Literature ,  Materials  and  Methods . 
Prospect  Heights,  IL:  Waveland  Press,  cl 976. 

Shield,  Bridget  and  Trevor  Cox.  “Concert  Hall  Acoustics:  Arts  and  Science”  from  University  of 
Salford  Manchester,  Acoustics,  http://www.acoustics.salford.ac.uk/acoustics  info/ 
concert  hall  acoustics/?content=  reverb.  1999/2000. 

Turner,  Harry.  “Picture  of  Anechoic  Chamber.”  Photo:  National  Research  Council  of  Canada. 

Wilson,  Harry  Robert.  Artistic  Choral  Singing:  Practical  Problems  in  Organization,  Technique 
and  Interpretation.  NY:  Schrimer  Inc.,  1959. 


55 


Appendix  A  -  CATT  Acoustic  Modeling  and  Predictions 


Figure  A.l  -  Medicine  Hat  Esplanade  Theatre  Three-Dimensional  View  from  Above 
Three-Dimensional  Model:  Esplanade  Theatre  with  1  source  location  and  14  receiver 
locations.  Each  colour  represents  a  different  surface  property  with  specific  sound  absorption 
characteristics.  Model  created  using  CATT  Acoustic. 


56 


Figure  A.2  -  Queen  Elizabeth  Theatre  Three-Dimensional  View  from  Above 


Three-Dimensional  Model:  Queen  Elizabeth  Theatre  with  1  source  location  and  14 
receiver  locations.  Each  colour  represents  a  different  surface  property  with  specific  sound 
absorption  characteristics.  Model  created  using  CATT  Acoustic.  Grey  shapes  on  bottom  left 
correspond  with  ceiling  reflectors  in  audience  area  of  theatre. 


57 


Figure  A.3  -  SFU  Experimental  Theatre  Three-Dimensional  View  from  Above;  Without  Curtains 


Figure  A.4  -  SFU  Experimental  Theatre  Three-Dimensional  View  from  Above;  With  Curtains 

Three-Dimensional  Models:  Simon  Fraser  University  Experimental  Theatre  with  1  source 
location  and  7  receiver  locations.  Each  colour  represents  a  different  surface  property  with 
specific  sound  absorption  characteristics.  The  first  model  has  hard  side  and  back  walls  in 
audience  area.  The  second  model  has  curtains  for  extra  sound  absorption  on  side  and  back  walls. 
Model  created  using  CATT  Acoustic. 


58 


Table  A.l  -  Results  of  CATT  Acoustic  Predictions 


Overall  Acoustic  Predictions 
(Average  All  Seats) 

RT  (sec.) 

EDT511  (sec.) 

C80  (dB) 

500  Hz 

1000  Hz 

500  Hz 

1000  Hz 

500  Hz 

1000  Hz 

SFU  Curtains 

0.7 

0.7 

0.6 

0.6 

7.9 

7.7 

SFU  No  Curtains 

1.2 

1.2 

1.1 

1.1 

3.7 

3.9 

Esplanade 

1.6 

1.6 

1.3 

1.3 

2.2 

2.4 

QET 

2.1 

2.1 

1.5 

1.5 

2.5 

2.7 

Overall  Acoustic  Predictions 
(Average  All  Seats) 

G(dB) 

D5051  (%) 

500  Hz 

1000  Hz 

500  Hz 

1000  Hz 

SFU  Curtains 

5 

5 

76 

74 

SFU  No  Curtains 

7 

7 

58 

59 

Esplanade 

5 

5 

48 

51 

QET 

0 

0 

49 

49 

Acoustic  Predictions 
(Specific  Seats) 

RT  (sec.) 

EDT  (sec.) 

C80  (dB) 

500  Hz 

1000  Hz 

500  Hz 

1000  Hz 

500  Hz 

1000  Hz 

SFU  Curtains  (Seat  3) 

0.7 

0.7 

0.6 

0.6 

8.2 

7.7 

SFU  No  Curtains  (Seat  3) 

1.2 

1.2 

1.1 

1.1 

3.5 

2.4 

Esplanade  (Seat  2) 

1.6 

1.6 

1.3 

1.3 

3.3 

2.2 

QET  (Seat  3) 

2.3 

1.7 

1.5 

1.5 

3.0 

2.7 

Acoustic  Predictions 
(Specific  Seats) 

G(dB) 

D50  (%) 

500  Hz 

1000  Hz 

500  Hz 

1000  Hz 

SFU  Curtains  (Seat  3) 

6 

6 

73 

70 

SFU  No  Curtains  (Seat  3) 

7 

7 

54 

52 

Esplanade  (Seat  2) 

6 

6 

50 

50  | 

QET  (Seat  3) 

1 

0 

46 

46 

30  Note:  EDT  is  another  acoustical  quantity  similar  to  RT.  It  stands  for  Early  Decay  Time,  and  for  the  sake  of  brevity 
and  simplicity,  was  not  considered  in  this  paper.  However,  acousticians  may  find  the  values  and  correlations  for 
EDT  of  interest. 

51  Similarly,  D-50,  distinctness,  is  an  important  acoustical  term  similar  to  C-80,  Clarity,  which  was  not  discussed  in 
this  paper.  Generally  D-50  is  useful  for  text  clarity,  while  C-80  applies  to  music. 


59 


' 


Appendix  B  -  C  ATT  Acoustic  Auralizations 

Please  refer  to  attached  cd  for  audio  files  listed  below. 


Track  1.  Produced  by  Wenger  Engineering  Ltd.  St.  Olaf  Cantorei  Choir  (80  voices)  performing 
a  piece  entitled  “Who  Is  This”  by  John  Ferguson  in  an  anechoic  chamber. 

Track  2.  Simulation  by  CATT  Acoustic  of  the  above  performance  in  Medicine  Hat’s  Esplanade 
Theatre. 

Track  3.  Simulation  by  CATT  Acoustic  of  the  same  in  Vancouver’s  Queen  Elizabeth  Theatre. 

Track  4.  Simulation  by  CATT  Acoustic  of  the  same  in  Simon  Fraser  University’s  Experimental 
Theatre  without  side  and  back  wall  curtains. 

Track  5.  Simulation  by  CATT  Acoustic  of  the  same  in  Simon  Fraser  University’s  Experimental 
Theatre  with  side  and  back  wall  curtains. 


60 


Appendix  C  -  Listener  Feedback  Data 


Listener  Instruction  Sheet 

Name: _  (Listening  Group: _ ) 

Contact  Info  (email  /  phone  number): _ 

Thank  you  for  your  participation  in  my  research  on  Choral  Diction.  You  are  about  to  hear  six 
short  excerpts  from  a  piece  of  choral  music  sung  in  English.  You  may  listen  to  each  excerpt  up  to 
five  times. 

For  each  excerpt,  please  write  down  anything  that  you  hear  in  the  space  provided.  This  may 
include  whole  phrases,  words,  syllables,  or  any  consonant  or  vowel  sounds  that  you  can 
decipher,  in  the  order  in  which  you  hear  them.  Please  indicate  any  “nonsense”  syllables  even  if  it 
doesn’t  sound  like  English. 

Excerpt  1  - _ 


Excerpt  2  - 


Excerpt  3  - 


Excerpt  4  - 


Excerpt  5- 


Excerpt  6  - 


Thank  you,  again,  for  your  participation! 


61 


Intelligibility  Data  for  Phrase  1 : 


Excerpt  1  -  ’’Who  is  this  who  walks  among  us" 

Grp  1 

Anechoic 

text 

% 

vowels 

% 

Con¬ 

sonant 

clusters 

% 

7 

8 

8 

EB 

"Yesu  is  among  us" 

2 

4 

5 

JL 

"Jesus...  is  this  who" 

3 

3 

2 

AM 

"Yesu  is  Yesu  Roxanna  Rah-s" 

1 

7 

3 

IE 

"Ou  sous  is  sti  o  ....  Naah-s" 

1 

5 

2 

RH 

"so  who  is  this  who  walks  on  water  ah-s" 

5 

7 

5 

CK 

"who  is  this  walks  on  manras" 

4 

7 

7 

average: 

2.7 

0.38 

6 

0.69 

4 

0.50 

Grp  2 

Medicine  Hat  Esplanade  Theatre 

NM 

"my  soul,  who  is  this  who  has  come  now" 

4 

6 

4 

AV 

"ooo  soayisuomomen" 

1 

4 

2 

ru 

"is  hesu  walks  on  ..." 

2 

4 

4 

SM 

"ee  soo  yee  ya  sa  ya" 

0 

4 

1 

SL 

"jesu,  is  su  so  mo  na' 

1 

5 

2 

SH 

"so  we  so  son  ah  song  a  ma" 

0 

3 

1 

average: 

1.3 

0.19 

4 

0.54 

2 

0.29 

Grp  3 

QET 

AG 

"who  is  this  who  walks  among  us" 

7 

8 

8 

JS 

"sole  la  song  la" 

0 

3 

0 

DU 

"In  this  whole  rockstar  above" 

1 

7 

1 

MK 

"is  sue  alta  same" 

1 

4 

0 

DavU 

"ee-I-soo-so-mas" 

0 

2 

3 

IB 

"soo  y  is  ooo" 

1 

3 

2 

average: 

1.7 

0.24 

5 

0.64 

2 

0.33 

Grp  4 

SFU  curtains 

DC 

"who  is  this  who  walks  on  the  grass" 

5. 

7 

6 

BB 

"who  is  this  who  walks  among  us" 

7 

8 

8 

NV 

"ish-u-dis-is-u-rock-some-on" 

1 

7 

2 

MS 

"this  is  who  walks  among  us?" 

5 

7 

6 

VS 

"who  as  song  who  is  this  woo" 

2 

2 

1 

62 


BenB 

"who  is  this  who  walks  among  us" 

7 

8 

8 

average: 

4.5 

0.64 

7 

0.93 

5 

0.74 

Grp  5 

SFU  no  curtains 

HH 

"oo  I  se  sul  mix  sue  mix  sun" 

0 

3 

2 

JK 

"if  see  is  this  who  why  song" 

3 

4 

4 

JE 

"who  wis  who  is  this  who  walks  on  watha" 

5 

7 

6 

NA 

"eso  mu  so  ma  ra  o  re  sy  to  mura  sta" 

0 

0 

0 

CS 

"fru  r  it  is  jesu  warsa  as" 

1 

5 

3 

JB 

"Jesu" 

0 

1 

0 

average: 

1.5 

0.21 

3 

0.48 

3 

0.36 

average  w/o  outlier 

1.8 

0.26 

3.8 

0.48 

3 

0.38 

7 

0 

8 

0 

8 

0 

anec 

average: 

2.67 

0.38 

5.50 

0.69 

4.00 

0.50 

MedH 

average: 

1.33 

0.19 

4.33 

0.54 

2.33 

0.29 

QET 

average: 

1.67 

0.24 

4.50 

0.64 

2.33 

0.33 

SFUc 

average: 

4.50 

0.64 

6.50 

0.93 

5.17 

0.74 

SFUnc 

average  w/o  outlier 

1.80 

0.26 

3.80 

0.48 

3.00 

0.38 

*  Grey  highlighted  entry  indicates  an  outlier  that  was  not  factored  into  overall  averages. 


63 


Consonant  Data  for  Phrase  1 : 


Excerpt:  'Who  Is  This  Who  Walks  Among  Us’ 

Fricatives 

Plosives 

Approx- 

imants 

Nasals 

Grp  1 

anechoic 

5 

1 

1 

2 

EB 

3 

o 

0 

2 

JL 

3 

0 

0 

0 

AM 

4 

1 

0 

0 

JE 

4 

0 

0 

0 

Ril 

4 

1 

1 

0 

CK 

4 

1 

1 

1 

3.6667 

0.5 

0.3333 

0.5 

Grp  2 

Esplanade  Theatre 

NM 

3 

1 

0 

0 

AV 

2 

0 

0 

1 

TU 

3 

1 

1 

o 

SM 

2 

0 

0 

o 

SL 

3 

0 

0 

1 

SH 

3 

0 

0 

0 

2.6667 

0.3333 

0.1667 

0.333 

Grp  3 

QET  j 

AG 

4 

1 

1 

2 

JS 

2 

0 

0 

1 

DU 

3 

1 

0 

0 

MK 

2 

0 

0 

0 

DavU 

3 

0 

0 

0  i 

JB 

2 

0 

0 

o 

2.6667 

0.3333 

0.1667 

0.5 

Grp  4 

SFU  curtains 

DC 

4 

1 

1 

0 

BB 

4 

1 

1 

2 

NV 

3 

1 

0 

1 

MS 

4 

1 

1 

2 

VS 

4 

0 

1 

0 

BenB 

4 

1 

1 

2 

64 


3.8333 

0.8333 

0.8333 

1.167 

Grp  5 

SFU  no  curtains 

HH 

3 

1 

0 

0 

JK 

4 

0 

1 

1 

JE 

3 

1 

1 

0 

NA 

3 

0 

0 

1 

CS 

4 

0 

1 

0 

JB 

3.4 

0.4 

0.6 

0.4 

total 

5 

1 

1 

2 

anec 

3.67 

0.50 

0.33 

0.50 

MedH 

2.67 

0.33 

0.17 

0.33 

QET 

2.67 

0.33 

0.17 

0.50 

SFUc 

3.83 

0.83 

0.83 

1.17 

SFUnc 

3.40 

0.40 

0.60 

0.40 

65 


Intelligibility  Data  for  Phrase  2: 


Excerpt  2  -  "Who  is  this  who  speaks  such  words" 

Grp  1 

Esplanade  Theatre 

words 

% 

vowels 

% 

Con¬ 

sonant 

clusters 

% 

6 

6 

6 

EB 

"Who  is  this  who  sees  such  love" 

5 

6 

4 

JL 

"Who  is  this  who  sees  such  love" 

5 

6 

4 

AM 

"who  is  this  who  sees  such  wor. . 

5 

6 

4 

JE 

"ou  I  sis  u  sis  a" 

0 

5 

0 

RH 

"who  is  this  who  sees  sosh  wan" 

4 

6 

3 

CK 

"who  is  this  who  sees  such  work" 

5 

6 

4 

average 

4.0 

0.67 

5.8 

0.97 

3.2 

0.53 

Grp  2 

QET 

NM 

"who  is  this  who  sees  such  woo" 

5 

6 

4 

AV 

"ooo  y  istituo" 

1 

4 

0 

ru 

"who  is  this  who  sees  such..." 

5 

6 

4 

SM 

"oo  yee  see  soo  see  so  sha" 

0 

5 

o 

SL 

"u  re  je  su  si  sa  jeu" 

0 

4 

|  1 

SH 

"ow  me  ow  e  see  sasha" 

0 

2 

0 

average 

1.8 

0.31 

4.5 

0.75 

1.5 

0.25 

Grp  3 

SFU  Curtains 

AG 

"who  is  this  who  sees  such  worth" 

5 

6 

4 

JS 

"who  is  this  who  sees  such  worth" 

5 

6 

4 

DU 

"who  is  this  who  saves  us  from" 

4 

5 

3 

MK 

"is  this  who  sees  such" 

4 

6 

3 

DavU 

"who  is  this  who  sees  such  love" 

5 

6 

4 

JB 

"this  who  sees  sush" 

2 

4 

2 

average 

4.2 

0.69 

5.5 

0.92 

3.3 

0.56 

Grp  4 

SFU  no  curtains 

DC 

"who  is  this  who  seeks  such  worth" 

5 

6 

5 

BB 

"who  is  this  who  sees  such  more" 

5 

6 

4 

NV 

"who  is  this  who  seeks  such. . . " 

5 

6 

5 

MS 

"who  is  this  who  seeks  such  love” 

5  | 

6 

5 

VS 

"who  is  this  who  sees  such  wor" 

5 

6 

4 

BenB 

"Who  is  this  who  sees  such  worth" 

5  ! 

6 

4 

66 


. 

average 

5.0 

0.83 

6 

1.00 

4.5 

0.75 

Grp  5 

Anechoic 

HH 

"who  is  this  who  speaks  such  ra" 

6 

6 

6 

JK 

"who  is  this  who  speaks  such  w'o" 

6 

6 

6 

JE 

"his  wis  dos  see  thas  come" 

0 

1 

0 

NA 

"who  is  this  who  speaks  such  wor" 

6 

6 

6 

CS 

"who  is  this  who  seeks  such  wo" 

5 

6 

5 

JB 

"who  is  this  who  seeks  such  worth" 

5 

6 

5 

average 

4.7 

0.78 

5.2 

0.86 

4.7 

0.78 

average  w/o  outlier 

5.6 

0.93 

6 

1.00 

5.6 

0.93 

6 

0 

6 

0 

6 

0 

anec 

average  w/o  outlier 

5.60 

0.93 

6.00 

1.00 

5.60 

0.93 

MedH 

average 

4 

0.67 

5.833 

0.97 

3.167 

0.53 

QET 

average 

1.83 

0.31 

4.50 

0.75 

1.50 

0.25 

SFUc 

average 

4.17 

0.69 

5.50 

0.92 

3.33 

0.56 

SFUnc 

average 

5.00 

0.83 

6.00 

1.00 

4.50 

0.75 

67 


Consonant  Data  for  Phrase  2: 


Excerpt:  'Who  is  this  who  speaks  such  words' 

Grp  1 

Medicine  Hat  Esplanade  Theatre 

Sibilants 

Plosives 

5 

2 

EB 

"Who  is  this  who  sees  such  love" 

5 

0 

JL 

"Who  is  this  who  sees  such  love" 

5 

0 

AM 

"who  is  this  who  sees  such  wor. . 

5 

0 

JE 

"ou  I  sis  u  sis  a" 

5 

0 

RH 

"who  is  this  who  sees  sosh  wan" 

5 

0 

CK 

"who  is  this  who  sees  such  work" 

5 

0 

average 

5 

0 

Grp  2 

QET 

NM 

"who  is  this  who  sees  such  woo" 

5 

0 

AV 

"ooo  y  istituo" 

1 

0 

TU 

"who  is  this  who  sees  such. . 

5 

0 

SM 

"oo  yee  see  soo  see  so  sha" 

5 

0 

SL 

"u  re  je  su  si  sa  jeu" 

3 

0 

SH 

"ow  me  ow  e  see  sasha" 

3 

0 

average 

3.667 

0 

Grp  3 

SFU  Curtains 

AG 

"who  is  this  who  sees  such  worth" 

5 

0 

JS 

"who  is  this  who  sees  such  worth" 

5 

0 

DU 

"who  is  this  who  saves  us  from" 

5 

0 

MK 

"is  this  who  sees  such" 

5 

0 

DavU 

"who  is  this  who  sees  such  love"  ; 

5 

0 

JB 

"this  who  sees  sush" 

5 

0 

average 

5 

0 

Grp  4 

SFU  no  curtains 

DC 

"who  is  this  who  seeks  such  worth"  | 

5 

i 

BB 

"who  is  this  who  sees  such  more" 

5 

0 

NV 

"who  is  this  who  seeks  such. . . " 

5 

1 

MS 

"who  is  this  who  seeks  such  love" 

5  ! 

1 

VS 

"who  is  this  who  sees  such  wor" 

5 

0 

BenB 

"Who  is  this  who  sees  such  worth" 

5 

0  j 

average 

5 

0.5 

68 


Grp  5 

Anechoic 

HH 

"who  is  this  who  speaks  such  ra" 

5 

2 

JK 

"who  is  this  who  speaks  such  wo" 

5 

2 

.IE 

"his  wis  dos  see  tlias  come" 

NA 

"who  is  this  who  speaks  such  wor" 

5 

2 

CS 

"who  is  this  who  seeks  such  wo" 

5 

1 

JB 

"who  is  this  who  seeks  such  worth" 

5 

1 

average 

average  w/o  outlier 

5 

1.6 

5 

2 

anec 

average  w/o  outlier 

5.00 

1.60 

MedH 

average 

5 

0 

QET 

average 

3.67 

0.00 

SFUc 

average 

5.00 

0.50 

SFUnc 

average 

5.00 

0.50 

69 


Intelligibility  Data  for  Phrase  3: 


Excerpt  3  -  "Is  it  Moses  or  Elijah  or  some  prophet  of  the  Lord" 

Grp  1 

QET 

text 

% 

vowels 

% 

Con¬ 

sonant 

clusters 

% 

1  1 

15 

14 

EB 

"glory” 

0 

2 

1 

JL 

"Who  is  this  who  sits. . .  froggy  of 
the  Lord" 

4 

8 

3 

AM 

Who  is  it,  who  is  . . .  some  prophet 
of  the  Lord” 

7 

10 

8 

JE 

"on  ou  oil  sis  sum  pra..." 

0 

2 

2 

RH 

"ohea  oo  ay  shay  nom  ba  shooo  sha 
mon  nom" 

0 

1 

0 

CK 

"aaah  Lord  some  profit  of  the  Lord" 

5 

8 

8 

average 

3 

0.24 

5 

0.34 

4 

0.26 

Grp  2 

SFU  curtains 

NM 

"who  is  it?  Who  is  it?  Oh  elijah  a 
prophet  of  the  Lord" 

7 

11 

9 

AV 

"stitusiobrofioso" 

0 

4 

I 

ru 

"who. . .  elijah  who  is  the  prophet  of 
the  Lord" 

5 

8 

7 

SM 

"see  two  jah  so  sa  pra  my  oh" 

0 

5 

2 

SL 

"is  it  jose  . . .  prophet  of  my  Lord" 

5 

8 

6 

SH 

"to  owe  owe  to  some  prophet  of  the 
wall" 

4 

6 

5 

average 

4 

0.32 

7 

0.47 

5 

0.36 

Grp  3 

SFU  no  curtains 

AG 

" . . .  some  prophet  of  the  Lord" 

5 

6 

8 

JS 

"some  profit  of  the  lor" 

4 

6 

7 

DU 

"is  it  who  saves  glorious. . .  or  some 
prophet  of  the  Lord” 

7 

10 

9 

MK 

"...  prophet  of  the  Lord" 

4 

5 

6 

DavU 

"o-sit-oo-sit  some  prophet  of  the 
Lord" 

5 

7 

7 

70 


JB 

"prolita  mala" 

0 

4 

3 

average 

4 

0.38 

6 

0.42 

7 

0.48 

Grp  4 

Anechoic 

DC 

"is  it  moses  or  Elijah  or  some 
prophet  of  the  Lord" 

1 1 

15 

14 

BB 

"who  is  this  Moses  some  prophet  of 
my  Lord" 

6 

10 

9 

NV 

"who  is  it  who  is  it  Elijah  some 
prophet  of  the  Lord" 

8 

13 

9 

MS 

"this  is  ...  is  from  the  Lord" 

3 

5 

VS 

"who  is  it,  who  is  it,  holy,  some 
prophet  of  the  Lord" 

7 

8 

7 

BenB 

"is  it  Moses  or  Elijah  or  some 
prophet  of  the  Lord" 

11 

15 

14 

average 

8 

0.70 

11 

0.73 

9 

0.67 

Grp  5 

Esplanade  Theatre 

HH 

"ooh  see  to  see  ooh  in  ts  in  some 
trump  in  my  Lord" 

2 

3 

;  4 

JK 

"who  is  it  who  whose  if  some 
prophet  of  the  Lord" 

7 

9 

7 

JE 

"Who  is  this  Moses  or  Elijah  or 
some  prophet  of  the  Lord" 

10 

15 

12 

NA 

"who  is  it  was  holy  righteous  voice 
of. . .  my  Lord" 

4 

4 

3 

CS 

"uuu  I  some  prophet  of  my  Lord" 

5 

6 

6 

JB 

" . . .  prophet  of  the  Lord" 

4 

5 

5 

average 

5 

0.48 

7 

0.47 

6 

0.44 

Total 

11 

0 

15 

0 

14 

0 

anec 

average 

7.67 

0.70 

11.00 

0.73 

9.33 

0.67 

MedH 

average 

5.33 

0.48 

7.00 

0.47 

6.17 

0.44 

QET 

average 

2.67 

0.24 

5.17 

0.34 

3.67 

0.26 

SFUc 

average 

3.50 

0.32 

7.00 

0.47 

5.00 

0.36 

SFUnc 

average 

4.17 

0.38 

6.33 

0.42 

6.67 

0.48 

71 


Consonant  Data  for  Phrase  3: 


Grp  1 

Fricatives 

Plosives  - 

Unvoiced 

Plosives  - 
Voiced 

Approx- 

imants 

Nasals 

6 

3 

1 

5 

2 

EB 

0 

0 

0 

2 

0 

JL 

6 

1 

1 

2 

1 

AM 

6 

3 

1 

2 

1 

JE 

3 

1 

0 

1 

1  j 

RH 

3 

0 

0 

0 

0  1 

CK 

4 

2 

1 

4 

1 

3.67 

1.17 

0.50 

1.83 

0.67 

Grp  2 

NM 

5 

3 

1 

2 

0 

AV 

4 

i 

0 

1 

o 

ru 

4 

2 

1 

3 

0 

SM 

3  . 

2 

0 

i 

0 

SL 

4 

3 

1 

2 

0 

SH 

4 

3 

0 

1 

1 

4.00 

2.33 

0.50 

1.67 

0.17 

Grp  3 

AG 

4 

2 

1 

2 

i  ! 

JS 

4 

2 

0 

2 

i 

DU 

6 

3 

1 

4 

i 

MK 

3 

3 

1 

2 

0 

DavU 

6 

3 

1 

2 

1 

JB 

1 

2 

0 

1 

o 

4.00 

2.50 

0.67 

2.17 

0.67 

Grp  4 

DC 

6 

3 

1 

5 

2 

BB 

5 

2 

1 

2 

2 

NV 

6 

.  1 

3 

1 

MS 

5 

0 

1 

2 

0 

VS 

6 

3 

1 

3 

1 

BenB 

6 

3 

1 

5 

2 

5.67 

2.33 

1.00 

3.33 

1.33 

72 


. 


Grp  5 

HH 

4 

1 

1 

2 

1 

JK 

6 

3 

1 

2 

1 

JE 

6 

2 

1 

4 

2 

NA 

5 

2 

1 

3 

0 

CS 

3 

2 

1 

2 

1 

JB 

3 

2 

1 

2 

0 

4.50 

2.00 

E00 

2.50 

0.83 

6 

3 

1 

5 

2 

anec 

5.67 

2.33 

E00 

3.33 

1.33 

MedH 

4.50 

2.00 

LOO 

2.50 

0.83 

QET 

3.67 

1.17 

0.50 

1.83 

0.67 

SFUc 

4.00 

2.33 

0.50 

1.67 

0.17 

SFUnc 

4.00 

2.50 

0.67 

2.17 

0.67 

73 


Intelligibility  Data  for  Phrase  4: 


Excerpt  4  -  "(Can  we  name  this)  suffering  servant" 

Grp  1 

SFU  curtains 

text 

% 

vowels 

% 

Con¬ 

sonant 

clusters 

% 

2 

4 

6 

EB 

"come  defend  this  suffering  servant" 

2 

i  4 

6 

JL 

"counting  ...  suffering  servant" 

2 

l 

6 

AM 

"come  with  end  this  suffering  servantttt" 

2 

4 

6 

JE 

"come  ...  suffering  servant" 

2 

4 

6 

RH 

"come  we  and  ye  suffring  sherpage" 

1 

4 

3 

CK 

"suffering  servant" 

2 

4 

6 

average 

1.83 

0.92 

4.00 

1.00 

5.50 

0.92 

Grp  2 

SFU  no  curtains 

NM 

"can  remain  his  suffering  servant" 

2 

4 

6 

AV 

"cambrisofrinsermodge" 

0 

3 

3 

TU 

"who. . .  suffering  servant" 

2 

4 

6 

SM 

"do  me  oo  me  sa  free  serviced" 

0 

J 

4 

SL 

"can. . .  make  me  suffering  servant" 

2 

4 

6 

SH 

"to  me  suffering  servant" 

2 

4 

6 

average 

1.33 

0.67 

3.67 

0.92 

5.17 

0.86 

Grp  3 

Anechoie 

AG 

" . . .  suffering  servant" 

2 

4 

6 

JS 

"suffering  servant" 

2 

4 

6 

DU 

"come  in  me  suffering  servant" 

2 

4 

6 

MK 

"can  you  blame  this  suffering  servant" 

2 

4 

6 

DavU 

"Can  you  hear  his  suffering  servant" 

2 

4 

6 

JB 

"suffering  silent" 

1 

3 

5 

average 

1.83 

0.92 

3.83 

0.96 

5.83 

0.97 

Grp  4 

Medecine  Hat 

DC 

"Can  he  ...  suffering  servant" 

2 

4 

6 

BB 

" . . .  suffering  servant" 

2 

4 

6 

NV 

" . . .  suffering  servant" 

2 

4 

6 

MS 

" . . .  suffering  servant" 

2 

4 

6 

VS 

" . . .  suffering  servant" 

2 

4 

6 

74 


BenB 

"can  it  be  any  suffering  servant" 

2 

4 

6 

average 

2.00 

1.00 

4.00 

1.00 

6.00 

1.00 

Grp  5 

QET 

HH 

"Tell  me  here  your  suffering  servant" 

2 

4 

6 

JK 

"this  suffering  servant" 

2 

4 

6 

JE 

"who's  this  suffering  servant" 

2 

4 

6 

NA 

" . . .  suffering  servant" 

2 

4 

6 

CS 

"tr...  suffering  servant" 

2 

4 

6 

JB 

"...  suffering  servant" 

2 

4 

6 

average 

2.00 

1.00 

4.00 

1.00 

6.00 

1.00 

2 

0 

4 

0 

6 

0 

anec 

average 

1.83 

0.92 

3.83 

0.96 

5.83 

0.97 

MedH 

average 

2.00 

1.00 

4.00 

1.00 

6.00 

1.00 

QET 

average 

2.00 

1.00 

4.00 

1.00 

6.00 

1.00 

SFUc 

average 

1.83 

0.92 

4.00 

1.00 

5.50 

0.92 

SFUnc 

average 

1.33 

0.67 

3.67 

0.92 

5.17 

0.86 

75 


Consonant  Data  for  Phrase  4: 


Grp  1 

Sibilants 

Plosives 

Approx- 

iniants 

Nasals 

4 

2 

2 

2 

EB 

4 

1 

2 

2 

JL 

4 

1 

2 

2 

AM 

4 

1 

2 

2 

JE 

4 

1 

2 

2 

RH 

3 

0 

2 

1 

CK 

4 

1 

2 

2 

3.83 

0.83 

2.00 

1.83 

Grp  2 

NM 

4 

1 

2 

2 

AV 

0 

2 

1 

TU 

4 

1 

2 

2 

SM 

4 

0 

2 

0 

SL 

4 

1 

2 

2 

SH 

4 

1 

2 

2 

3.83 

0.67 

2.00 

1.50 

Grp  3 

AG 

4 

1 

2 

2 

JS 

4 

1 

2 

2 

DU 

4 

1 

2 

2 

MK 

4 

1 

2 

2 

DavU 

4 

1 

2 

2 

JB 

3 

1 

1 

1 

3.83 

1.00 

1.83 

1.83 

Grp  4 

DC 

4 

1 

2 

2 

BB 

4 

1  i 

2 

2 

NV 

4 

1 

2  ■ 

2 

MS 

4 

1 

2 

2 

VS 

4 

1 

2 

2 

BenB 

4 

1 

2 

2  ! 

4.00 

1.00 

2.oo  j 

2.00 

76 


Grp  5 

HH 

4 

1 

2 

2 

JK 

4 

1 

2 

2 

JE 

4 

1 

2 

2 

NA 

4 

1 

2 

2  ^ 

CS 

4 

1 

2 

2 

JB 

4 

1 

2 

2 

4.00 

LOO 

2.00 

2.00 

4 

2 

2 

2 

anec 

3.83 

1.00 

1.83 

1.83 

MedH 

4.00 

1.00 

2.00 

2.00 

QET 

4.00 

1.00 

2.00 

2.00 

SFUc 

3.83 

0.83 

2.00 

1.83 

SFUnc 

3.83 

0.67 

2.00 

1.50  ! 

77 


Intelligibility  Data  for  Phrase  5: 


Excerpt  5  -  "Can  we  name  the  promised  Son" 

Grp  1 

SFU  no  curtains 

text 

% 

vowels 

% 

Con¬ 

sonant 

clusters 

% 

6 

7 

8 

EB 

"come  remain  upon. . .  song" 

0 

5 

1 

JL 

"can  we  made. . .  son" 

3 

4 

3 

AM 

"can  we  make  . . .  promised  son" 

4 

6 

6 

JE 

"can  we  make  our  promise  son" 

4 

8 

6 

RH 

"come  ye  pay  na  pay  na  play  your  song" 

0 

3 

1 

CK 

"can  we  make  a  . . .  song  song" 

2 

5 

2 

average 

2.17 

0.36 

5.17 

0.74 

3.17 

0.40 

Grp  2 

Anechoic 

NM 

"can  we  make  a  promise  on" 

3 

7 

5 

AV 

"can  we  make  a  promise" 

3 

6 

4 

TU 

"can  we  name  upon  the  son" 

4 

6 

4 

SM 

"can  we  name  the  promised  son" 

6 

7 

8 

SL 

"can  we  meet  my  promised  son" 

4 

5 

6 

SH 

"can  we  name  the  promised  son" 

6 

7 

8 

Average 

4.33 

0.72 

6.33 

0.90 

5.83 

0.73 

Grp  3 

Esplanade  Theatre 

AG 

"can  we  meet  upon  this  sign" 

2 

6 

3 

JS 

"coming  in  the  song" 

0 

4 

1 

DU 

"come  in  towards  the  sun" 

1 

2 

2 

MK 

"can  you  name  this  ..." 

2 

2 

1 

DavU 

"Can  me  esos-sen-mis-a" 

1 

4 

1 

JB 

"aa  sun  callee" 

0 

0 

0 

Average 

1.00 

0.17 

3.00 

0.43 

1.33 

0.17 

average  w/o  outlier 

1.20 

0.20 

3.60 

0.51 

1.60 

0.20 

Grp  4 

Medicine  Hat 

DC 

"Can  he  . . .  suffering  servant" 

2 

4 

6 

BB 

"...  suffering  servant" 

2 

4 

6 

NV 

"...  suffering  servant"  j 

2 

4 

6 

MS 

" . . .  suffering  servant" 

2 

4 

6 

VS 

" . . .  suffering  servant" 

2 

4 

6 

BenB 

"can  it  be  any  suffering  servant" 

2 

4 

6 

78 


average 

2.00 

1.00 

4.00 

1.00 

6.00 

1.00 

Grp  5 

QET 

HH 

"Tell  me  here  your  suffering  servant" 

2 

4 

6 

JK 

"this  suffering  servant" 

2 

4 

6 

JE 

"who's  this  suffering  servant" 

2 

4 

6 

NA 

" . . .  suffering  servant" 

2 

4 

6 

CS 

"tr...  suffering  servant" 

2 

4 

6 

TB 

" . . .  suffering  servant" 

2 

4 

6 

average 

2.00 

1.00 

4.00 

1.00 

6.00 

1.00 

2 

0 

4 

0 

6 

0 

anec 

average 

1.83 

0.92 

3.83 

0.96 

5.83 

0.97 

MedH 

average 

2.00 

1.00 

4.00 

1.00 

6.00 

1.00 

QET 

average 

2.00 

1.00 

4.00 

1.00 

6.00 

1.00 

SFUc 

average 

1.83 

0.92 

4.00 

1.00 

5.50 

0.92 

SFUnc 

average 

1.33 

0.67 

3.67 

0.92 

5.17 

0.86 

79 


. 

• 

Consonant  Data  for  Phrase  5; 


Grp  1 

Fricatives 

Plosives 

(un-voiced) 

Plosives 

(voiced) 

Approximants 

Nasals 

3 

1 

1 

2 

5  | 

EB 

i 

1 

0 

0 

o  ! 

JL 

1 

0 

0 

1 

2 

AM 

2 

1 

1 

2 

3 

JE 

2 

1 

0 

2 

3 

RH 

1 

1 

0 

1 

!  0 

CK 

2 

0 

0 

1 

1 

1.50 

0.67 

0.17 

1.17 

1.50 

Grp  2 

NM 

1 

1 

0 

2 

3 

AV 

1 

1 

0 

2 

2 

TU 

1 

1 

0 

1 

4 

SM 

3 

1 

1 

2 

5 

SL 

2 

1 

1 

2 

3 

SH 

3 

1 

1 

2 

5 

1.83 

1.00 

0.50 

1.83 

3.67 

Grp  3 

AG 

3 

1 

0 

1 

2 

JS  | 

1 

0 

0 

0 

0 

DU 

1 

0 

0 

1 

1 

MK 

2 

0 

0 

0 

3 

DavU 

2 

0 

0 

0 

1 

JB 

1.80 

0.20 

0.00 

0.40 

1.40 

Grp  4 

DC 

3 

1 

1 

1 

3 

BB 

2 

0 

0 

1 

4 

NV 

0 

0 

0 

2 

1 

MS 

1 

0 

0 

1 

2 

VS 

3 

0 

o 

1 

1 

BenB 

3 

0 

0 

2 

2  | 

2.00 

0.17 

0.17 

1.33 

2.17 

80 


Grp  5 

HH 

2 

1 

0 

1 

1 

JK 

0 

0 

1 

1 

JE 

2 

0 

0 

1 

1 

NA 

2 

1 

0 

1 

2 

CS 

2 

0 

0 

1 

2 

JB 

1 

1 

0 

0 

2 

2.00 

0.50 

0.00 

0.83 

1.50 

3 

1 

1 

2 

5 

anec 

1.83 

1.00 

0.50 

1.83 

3.67 

MedH 

1.80 

0.20 

0.00 

0.40 

1.40 

QET 

2.00 

0.17 

0.17 

1.33 

2.17 

SFUc 

2.00 

0.50 

0.00 

0.83 

1.50 

SFUnc 

1.50 

0.67 

0.17 

1.17 

1.50 

81 


Intelligibility  Data  Phrase  6: 


Excerpt  4  -  "Can  we  name  the  heir  of  David" 

Grp  1 

An  echoic 

text 

% 

vowels 

% 

Con¬ 

sonant 

clusters 

%  [ 

7 

8 

9 

EB 

"Can  we  name  the  heir  David" 

6 

8 

9 

JL 

"Can  we  meet  thee  at  all  David" 

3 

5 

4 

AM 

"Can  we  make  the  heir  of  David” 

6 

8 

7 

JE 

"can  we  make  our  own  diven" 

2 

4 

3 

RH 

"come  and  make  me  your  ulti-mage" 

0 

3 

0 

CK 

"can  me  make  me  harrow  di" 

2 

6 

3 

average 

3.17 

0.45 

5.67 

0.71 

4.33 

0.48 

Grp  2 

Esplanade  Theatre 

NM 

"can  we  hear  the  heritage" 

2 

5 

3 

AV 

"can  we  make  the  royalty 

2 

4 

3 

TU 

"can  we  name  the  errol  divige" 

3 

7 

6 

SM 

"Can  we  name  thee  aral  dame  ee" 

3 

7 

5 

SL 

"can  we  make  me  little  derange" 

2 

7 

2 

SH 

"can  we  name  the  herald  image" 

4 

6 

average 

2.67 

0.38 

6.17 

0.77 

4.17 

0.46 

Grp  3 

QET 

AG 

0 

0 

0 

JS 

"can  we  hear  the  angels  singing" 

2 

6 

2 

DU 

"coming  in  me  overtake  me" 

o  j 

5 

1 

MK 

"can  we  . . .  " 

2 

2 

2 

DavU 

"can  you  hear  the  herald  teach" 

1 

4 

2 

JB 

"ee  callee  elee" 

0 

0 

0 

average 

0.83 

0.12 

2.83 

0.35 

1.17 

0.13 

Grp  4 

SFU  curtains 

DC 

"Can  we  ?  The  erald  image" 

2 

6 

4 

BB 

"Can  we  name  him  herald  david" 

4 

8 

7  ! 

NV 

"can  we  give  him  our" 

2 

2 

2 

MS 

"come  redeem  me" 

0 

0 

1 

VS 

"can  we  be  be  dero  diviche" 

2 

6 

3 

BenB 

"can  we  meet  thee  there,  O  David" 

3 

6 

4 

82 


. 

average 

2.17 

0.31 

4.67 

0.58 

3.50 

0.39 

Grp  5 

SFU  no  curtains 

HH 

"Come  with  me  the  arrow  tegin" 

0 

5 

1 

JK 

"can  we  maybe  error  delish" 

2 

8 

3 

JE 

"Can  we  maybe  hear  a  mahtech" 

2 

8 

4 

NA 

"kind  we  may  be  er  oh  ke  geish" 

0 

8 

0 

CS 

"can  Ave  maybe  herald  hi  insh" 

2 

7 

4 

JB 

"Kindly  name...  given" 

1 

4 

2 

average 

1.17 

0.17 

6.67 

0.83 

2.33 

0.26 

7 

0 

8 

0 

9 

0 

anec 

average 

3.17 

0.45 

5.67 

0.71 

4.33 

0.48 

MedH 

average 

2.67 

0.38 

6.17 

0.77 

4.17 

0.46 

QET 

average 

0.83 

0.12 

2.83 

0.35 

1.17 

0.13 

SFUc 

average 

2.17 

0.31 

4.67 

0.58 

3.50 

0.39 

SFUnc 

average 

1.17 

0.17 

6.67 

0.83 

2.33 

0.26 

83 


Consonant  Data  Phrase  6: 


Excerpt  6  Can  we  name  the  heir  of  David” 

Grp  1 

Fricatives 

Plosives  - 
Unvoiced 

Plosives  - 
V  oiced 

Approx- 

imants 

Nasals 

1 

1  . 

2 

2 

3 

EB 

1 

1 

2 

2 

3 

JL 

1 

1 

2 

1 

i 

AM 

1 

1 

2 

2 

i 

JE 

0 

1 

1 

2 

i 

RH 

0 

1 

0 

1 

i 

CK 

0 

1 

1 

1 

i 

0.50 

1.00 

1.33 

1.50 

1.33 

Grp  2 

NM 

1 

1 

0 

2 

1 

AV 

i 

1 

0 

2 

1 

TU 

1 

1 

1 

2 

3 

SM 

1 

1 

1 

2 

3 

SL 

0 

1 

1 

1 

1 

SH 

1 

1 

1 

2 

3 

0.83 

1.00 

0.67 

1.83 

2.00 

Grp  3 

AG 

0 

0 

0 

0 

o 

JS 

1 

1 

0 

1 

0 

DU 

0 

1 

1 

1 

1 

MK 

0 

1 

0 

1 

1 

DavU 

1 

1 

1 

1 

1 

JB 

0 

0 

0 

0 

0 

0.33 

0.67 

0.33 

0.67 

0.50 

Grp  4 

DC 

1 

] 

1 

2 

1 

BB 

0 

1 

2 

2 

3 

NV 

0 

1 

0 

2 

1 

MS 

o  : 

1 

0 

0 

0 

VS 

0 

1 

1 

2 

1 

BenB 

i 

1 

2 

2 

1 

0.33 

1.00 

1.00 

1.67 

1.17 

84 


Grp  5 

HH 

1 

1 

o 

2 

0 

JK 

0 

1 

1 

2 

1 

JE 

0 

1 

0 

2 

1 

NA 

0 

1 

0 

2 

1 

CS 

0 

1 

0 

2 

1 

JB 

0 

1 

0 

0 

3 

0.17 

1.00 

0.17 

1.67 

1.17 

1 

1 

2 

2 

3 

anec 

0.50 

1.00 

1.33 

1.50 

1.33 

MedH 

0.83 

1.00 

0.67 

1.83 

2.00 

QET 

0.33 

0.67 

0.33 

0.67 

0.50 

SFUc 

0.33 

1.00 

1.00 

1.67 

1.17 

SFUnc 

0.17 

1.00 

0.17 

1.67 

1.17 

85 


Data  Trends  and  Results: 


Overall  Intelligibility,  weighted  average  according  to  #  of  words  /  vowels  /  consonant  clusters  i 

Total 

39 

48 

51 

Words 

Vowels 

Consonant  Clusters 

Anechoic  Chamber 

0.65 

0.80 

0.68 

Esplanade 

0.42 

0.64 

0.46 

QET 

0.29 

0.55 

0.35 

SFU  Curtains 

0.45 

0.68 

0.48 

SFU  No  Curtains 

0.40 

0.66 

0.49 

Intelligibility  of  various  consonant  sounds 

Total 

24 

10 

4 

9 

14 

Fricatives 

Plosives 

(Unvoiced) 

Plosives  (V oiced) 

Approximants 

Nasals 

Anechoic 

0.85 

0.74 

0.71 

0.94 

0.62 

Esplanade 

0.78 

0.45 

0.42 

0.75 

0.47 

QET 

0.68 

0.33 

0.25 

0.65 

0.42 

SFU  Curtains 

0.79 

0.60 

0.38 

0.69 

0.42 

SFU  No  Curtains 

0.75 

0.57 

0.25 

0.78 

0.37 

Graphs  of  Data  Results: 


i 

o 

o 

0 

lo 

Ln 

;s>  o 

■®  o 


Correlation:  RT  and  Intelligibility 


0.00 


1.00  2.00 

RT  (s) 


>  RT  at  500  Hz 
■  RT  at  1  kHz 


3.00 


86 


Correlation:  EDT  and  Intelligibility 


0.00  1.00  2.00 

EDT  (s) 


«  EDT  at  500  Hz 
■  EDT  at  1  kHz 


Correlation:  CSOand  Intelligibility 

1.00 
0.90 
0.80 
0.70 

S  0.60 

E 

B  0.50 
I  0.40 
0.30 
0.20 
0.10 
0.00 

0.00  5.00  10.00 

€80 (dB) 


C80  at  500  Hz 
i  €80  at  1  kHz 


87 


Correlation:  D50  and  Intelligibility 


♦  D50  at  500  Hz 
■  D50  at  1  kHz 


D50  (%) 


Text  Intelligibility  in  Various  Acoustic  Spaces 


jr 

E 

‘u> 

"a! 

4- 

c 


1.00 

0.90 

0.80 

0.70 

0.60 

0.50 

0.40 

0.30 

0.20 

0.10 

0.00 


Venue 


*  Anechoic  Chamber 


■  Esplanade 


QET 

■  SFU  Curtains 


■  SFU  No  Curtains 


88 


Intelligibility  of  Consonant  Types 


m  Anechoic 
■  Esplanade 
QET 

*  SFU  Curtains 
m SFU  No  Curtains 


Intelligibility  of  Vowels  and 
Consonants 


se  Anechoic 

■  Esplanade 
QET 

■  SFU  Curtains 

■  SFU  No  Curtains 


Vowels  Consonant 
Clusters 


89 


